Moderna’s secure and scalable data pipelines using Etleap

Moderna’s data engineering leader Carlos Peralta recently presented how they have handled the exponential data growth at the biotech firm over the past two years. 

  • Case studies
  • Engineering
  • ETL

Moderna’s secure and scalable data pipelines using Etleap

Moderna’s data engineering leader Carlos Peralta recently presented how they have handled the exponential data growth at the biotech firm over the past two years. 

  • Case studies
  • Engineering
  • ETL

Moderna’s secure and scalable data pipelines using Etleap

Moderna’s data engineering leader Carlos Peralta recently presented how they have handled the exponential data growth at the biotech firm over the past two years. 

  • Case studies
  • Engineering
  • ETL

Moderna’s data engineering leader Carlos Peralta recently presented how they have handled the exponential data growth at the biotech firm over the past two years.

Challenges

Moderna faced three main challenges in supporting its scientists with the full breadth of its data.

  1. Managing a wide range of data sources

  2. Operating in a highly secure environment.

  3. Managing rapid growth in data and demand for it, with a small team of data engineers

undefined

Solution

The team has successfully navigated these challenges with the help of a modern data architecture, anchored by an AWS Redshift data warehouse and Etleap for data transformations and pipelines.

undefined

Data Sources

Peralta’s team faced nearly 100 internal applications as siloed data sources. Some partners, such as drug manufacturers, provide Moderna their data through file servers and databases. In the past, the team tried one-off tools to ingest a given data source, but found that did not scale. By unifying its pipelines process, the team has become far more efficient and been able to tap into the expertise of the Etleap support team for troubleshooting and strategy.

Moderna has used the easy Etleap user interface to build pipelines from S3 buckets, file stores, and hundreds of SQL and NoSQL databases like SQL Server, Postgres and MongoDB. Most of these were available “out of the box” with Etleap, and Etleap quickly added integration support for Moderna’s non-standard sources.

Security

Data is highly sensitive and heavily regulated in the health and life sciences industry. Moderna’s vendors must meet stringent audit and change management compliance. Moderna also cannot use traditional SaaS workflows where its data would leave its own virtual private cloud (VPC). While Etleap is most commonly deployed as SaaS, it is also one of the few ETL tools available for deployment in a company’s VPC.  A key part of this deployment is sending de-identified diagnostic data to enable Etleap to provide the same proactive support it delivers for SaaS customers.

Etleap’s focus on security has given Moderna comfort and flexibility. For instance, it has allowed Moderna to isolate ingest and transformation workloads from other AWS workloads and run separate VPCs for production and pre-production. This has helped with audit-ability and development efficiency.

Data Engineering Team Productivity

The Moderna data engineering team has big expectations from the company, and they need to devote their scarce time to differentiating projects rather than the plumbing of building and maintaining data pipelines. In Etleap, they found a solution that is tightly integrated with AWS and just works.

Etleap has accelerated pipeline creation and also cut the time needed to maintain pipelines. If there's a connection issue or a schema change, Etleap delivers alerts and proposed corrections.

The small team has been able to manage pipelines into both an Amazon Redshift data warehouse and AWS Glue as its data lake catalog. They specify the schema and table name, and Etleap automatically creates the tables, pulls the data from the sources, applies the transformations, and then finally loads the data.

Impact

Moderna has had a busy few years to say the least. Along with its growing public profile has come a meteoric growth in data and the need to consolidate it for analytics and AI/ML workloads.  Thanks to the Etleap architecture and private cloud (VPC) deployment option, Moderna does not worry about scalability and is confident that its data stays secure.

undefined

The small data engineering team can now add multiple new data sources every week. This depends on a highly functioning ETL environment that balances simplicity and robustness. Moderna’s data team and even non-technical users have been able to quickly learn to build pipelines with Etleap. Etleap’s support team and its deep ETL experience are also a powerful resource. Etleap support minimizes the pipeline maintenance Moderna needs to worry about and also helps quickly resolve edge cases in pipeline creation and maintenance.

This all enables a highly effective team and analytics environment that lets Moderna devote more resources to its essential work around mRNA science and medicine.

Related articles

4.9 (23)

Leader

2024

FALL

SOC II

HIPAA

GDPR

CCPA

Ingest and model data with best-in-class tools and support.

4.9 (23)

Leader

2024

FALL

SOC II

HIPAA

GDPR

CCPA

Ingest and model data with best-in-class tools and support.

4.9 (23)

Leader

2024

FALL

SOC II

HIPAA

GDPR

CCPA

Ingest and model data with best-in-class tools and support.

4.9 (23)

Leader

2024

FALL

SOC II

HIPAA

GDPR

CCPA

Ingest and model data with best-in-class tools and support.