When PagerDuty set out to power insights through a new data warehouse, they could not have predicted how much the data would permeate the organization. It started as an initiative to provide reports to one internal group, but quickly created a culture where everyone at PagerDuty can utilize the data warehouse to improve the lives of their customers. The engineering team was now able to make any and all data available in a central data warehouse. They democratized their data, allowing analysts across the organization to draw insights from the data warehouse whenever they need it, using any business intelligence tool they want.
However, establishing a data-informed culture did not come without challenges. The new data warehouse, which was not intended to gain widespread usage, was difficult to grow and maintain. The special-projects team that started the data warehouse found themselves consumed with requests, spending half of their time creating data pipelines and updating database schemas. They could not keep up with the request pipeline on top of their existing responsibilities that were central to the growth of PagerDuty.
To ease their pain, the team searched for a centralized data flow platform that would allow them to rebuild and maintain the data warehouse with minimal engineering work. The platform would need to handle the entire Extract, Transform, and Load (ETL) process, which entails pulling data from the source, performing any intermediary parsing and transformations, and loading the transformed data into the data warehouse.
Their new process, with Etleap, cut the time to complete data flow requests from weeks to hours.
After evaluating several alternatives, they discovered Etleap and were able to get up and running immediately. Their new process, with Etleap, cut the time to complete data flow requests from weeks to hours. Etleap empowered them to identify and resolve any data issues, and to handle custom data transformations that are instrumental to performing analyses across their services. Before long, PagerDuty consolidated over 20 data sources into one data warehouse, and analytics permeated throughout the organization.
Today, PagerDuty continues to grow its data warehouse with Etleap to drive insights for the company. With the data warehouse easier to maintain and scale even by analysts, the engineering team can focus on the PagerDuty platform.
Background
PagerDuty helps digital businesses and their technical teams keep pace with customer expectations without sacrificing reliability or agility. Their digital operations management service provides visibility and actionable insights into applications, services, and infrastructure. They eliminate worry for developers and DevOps teams related to potential incidents by sending notifications, surfacing relevant information, and streamlining the incident management lifecycle.
To ensure they are providing the best experience for their customers, PagerDuty created a team to manage a data warehouse and drive internal reporting and data availability. At first, this team was focused on providing internal reports to a small subset of the company. The data warehouse was initially created using manual, complicated scripts that combined various data sources. However, as word spread about the team and its capabilities, requests began pouring in from across the organization and the data warehouse became more and more difficult to manage.
Challenge
Reporting capabilities grew from a nice-to-have to something that was routinely requested across the entire organization. According to Thomas D., an engineer responsible for data engineering and special projects, “It kind of just grew from there. Unfortunately, our team didn’t grow as fast as the demand.” Recognizing that data engineers are difficult to find in the Bay Area, Thomas knew that the special projects team would need to grow its offerings without expanding its personnel resources.
[Custom ETL scripts were] a huge maintenance nightmare.
After assessing their initial data needs, PagerDuty decided to build their data warehouse in AWS Redshift. They started with a blank slate, but then found themselves developing custom scripts for each data source to funnel and format data into the data warehouse. Those scripts served their needs until more and more requests rolled in for loading additional data sources. Although internal customers were happy with the new ability to perform analysis on centralized data, maintenance of the data flows and warehouse began to consume the lives of the special projects team.
“It was a huge maintenance nightmare. It didn’t handle schema migrations so every time somebody changed a column in a database it basically forced us to manually change the code, deploy, and half a day later it failed due to some other reason,” said Thomas.
The growing population of users accessing the data warehouse increased the importance of establishing an efficient means to manage the data and establish a reliable single source of truth.
…there was a growing need to be data-driven to support our growing business.
As data flow requests continued to increase, Thomas and his team found themselves spending about half of their time creating and updating data flow processes. For a team with other responsibilities related to core business areas such as software development and billing, there simply were not enough hours in the day to undertake such a massive, time-consuming process. According to Thomas, “people would have to file a ticket with us and wait a couple of weeks, or a month, or more.” Continuing down this path would mean saying “no” to requests or having to make additional hires just to keep up. “We’re a tiny team compared to the amount of people in the company, so we couldn’t serve everybody, and yet there was a growing need to be data-driven to support our growing business.”
As the data warehouse grew in both complexity and reach, PagerDuty placed analytics into the hands of those outside the special projects team by allowing them to access data through any BI tool of their choice. The growing population of users accessing the data warehouse increased the importance of establishing an efficient means to manage the data and establish a reliable single source of truth.
Despite these challenges, Thomas pointed out that “Redshift has… become the focal point of any kind of data analysis within the company because it’s fast and you can join across multiple databases…” The solution lay not in changing technologies or making additional hires in a crowded talent market, but in simplifying and streamlining the data flow process.
Resolution
To alleviate the time shortage created by data pipeline requests, Thomas and the rest of the team set out to find a SaaS vendor that would simplify the entire process, including pipeline management, automatic schema updates, and data transformations.
“The turning point was we were spending… probably half our time maintaining these things. Every day we would get a headache in the morning,” said Thomas.
PagerDuty evaluated multiple solutions for the right mix of flexibility and simplicity. The options ranged from overly simple solutions that could not be adapted to their particular needs to all-encompassing software that matched every increase in flexibility with overwhelming complexity.
When it came time to evaluate Etleap, the team first connected their product data and two other sources central to powering the PagerDuty platform. What struck Thomas was the extreme ease of connecting Etleap to these data sources, transforming them with Etleap’s interactive data wrangling, and loading them into the Redshift data warehouse.
[Etleap] allowed us to be nimble and agile while still providing data for the whole company, across all our databases.
“One of the main selling points for us was that Etleap was a fully managed service. We didn’t have to maintain any custom hardware or services ourselves,” said Thomas. They even configured more complex and critical transformations with help from the Etleap support team.
“[Etleap] allowed us to be nimble and agile while still providing data for the whole company, across all our databases.”
Ultimately, PagerDuty chose Etleap for its ease of connecting and maintaining data sources. Etleap also provided flexibility by allowing for data transformations that were not possible with the other providers. That flexibility allowed them to join user information across databases without sacrificing data security and user privacy. These benefits came packaged as a SaaS solution without the need to install or maintain the software internally.
According to Thomas, “[Etleap] allowed us to be nimble and agile while still providing data for the whole company, across all our databases.
Results
Since adapting Etleap as its data platform, PagerDuty successfully implemented data flows and ETL for over 20 major data sources, including multiple PostegreSQL and MySQL databases, Salesforce, Marketo, Google Analytics, JIRA, and various log files in S3. Besides simplifying the ETL process itself, Thomas saw a drastic decrease in time spent on setup and maintenance tasks such as access management. He can perform everything from user management to database connections to data transformations with Etleap.
“When I go to sleep, I don’t have to wake up in the middle of the night fixing our ETL pipelines.”
With ETL running virtually on autopilot, the special projects team regained time to focus on operations more central to software development and the business in general, without turning to the competitive talent marketplace. Now, Thomas can sleep soundly knowing there are no emergencies waiting to derail his priorities the next day: “When I go to sleep, I don’t have to wake up in the middle of the night fixing our ETL pipelines.”
A complete and reliable data warehouse also ensures accuracy in analytics. With Etleap, PagerDuty consolidates data into a single data warehouse in a clean and consistent format, available to users through any BI tool they choose. According to Thomas, “with these new [BI] tools, you don’t have to worry about setting up the connection, you don’t have to worry about firewalls, you don’t have to worry about SSH. You just log into the tool and the tool connects to our Redshift cluster.”
Thomas is excited about the opportunities of company-wide data availability created by the improved ETL processes. He beamed, “I’ve run out of requests… There are [currently] no more databases to put into Redshift.” The democratized data has transformed from a constant headache to a fulfilled promise of a data-informed company and culture.