Invoice2go is a SaaS company, with offices in Australia and Silicon Valley, that makes invoicing quick and convenient for small business owners. Their custom invoice templates, payment manager, and mobile app for invoices-on-the-go help hundreds of thousands of small business owners bill over $1 billion per month.
Growing small businesses and exceeding user expectations can only be made possible through holistic, integrated data management. To succeed, each Invoice2go team must be equipped with bulletproof, decision-making data. From analysts making stat boards and experimenting with new app features, to executives needing to know how the business is performing on an hour-to-hour basis, no one should be left grasping in the dark.
“Our company is adamant that everyone has access to all data and that we operate an open environment,” explains Alain Kramar, a Senior Software Engineer at Invoice2go. “To sustain a healthy business nothing should be off-limits in terms of what information can be tapped into.”
To enhance the customer experience, as well as maintain their position as one of the world’s most downloaded invoice and billing apps, Invoice2go realized they needed to overhaul their data infrastructure. Not only was customer volume growing and adding enormous amounts of data to be stored and processed, but competing invoice apps were entering the market each year. This meant customized, data-informed outreach was essential to retain and expand their user count. Unfortunately, their data ecosystem lacked the agility, robustness, and modern framework needed to achieve their targets.
Why spend time building pipelines from scratch when a vendor like Etleap can do it for us, more effectively than we can do it on our own?”
Alain Kramar, Senior Software Engineer, Invoice2go*
Better data means better business
Each Invoice2go division leverages data uniquely to help the company innovate and expand their market reach. Product management tracks how users are interacting with the website and app, user acquisition monitors installations, app ratings, and app rankings, while customer research develops a qualitative understanding of their ideal customer. For all divisions to succeed individually and as a team data pipelines need to be thorough, accessible, and reliable. Previously, their core data streams relied on two Apache Spark clusters, but as the company grew and user demands evolved the data ecosystem became fragile and risk-laden. The Data Team was spending countless hours fixing bugs rather than focusing on new developments. The result was less engagement with individual teams developing custom projects to improve the company. The causes for this were threefold:
Lost data threatens system shutdown Cluster One- on AWS EMR- picked up from Kinesis streams, used a common schema library, decoded the Protobuf, and wrote them to S3 as Parquet files. Cluster Two- on Databricks- picked up the Parquet files from S3 and loaded them to Redshift. The Spark cluster ran on an old Protobuf library, so as schemas evolved objects with new attributes were skipped altogether. They feared that a major schema version roll-out would be incompatible and inflict total system collapse.
Complex and code-intensive workflow handicaps data reach The clusters required thousands of lines of Scala code and multiple build tools and procedures to deploy. This necessitated a large team of engineers, who in addition to being bogged down with code writing and bug fixing. System complexity led to fear that any alterations would result in severe disruptions, and this complexity made it very difficult for new engineers to contribute to the team without hand-holding. This lack of flexibility created system-wide fragility and limited the trust placed in the data by the organization.
Obsolete ETL initiates scramble for a new provider In the meantime, their trusted ETL provider, Alooma, dropped support for the AWS ecosystem. This put the data team on alert, as ETL was critical to delivering actionable data to the company divisions and AWS was compatible with their sources and data objectives. Without ETL, unstructured and semi-structured data (such as JSON files) would be inaccessible for analysts. And because their data infrastructure was deeply entwined with Amazon Redshift, they deemed the time associated with migrating to a different data warehouse costly and impractical.
I want the Data Team to spend less time on boilerplate tooling that can be found off the shelf, and more time enabling product managers, customer research, and analysts to do their jobs more efficiently. A tool like Etleap lets us build out a new pipeline with a few clicks. This allows our team to optimize time spent on business-specific Invoice2go objectives, not building out EMR clusters.
Alain Kramar, Senior Software Engineer, Invoice2go
Building a modern data ecosystem with AWS and Etleap
Invoice2go assessed the data needs unique to their organization and sought to leverage a combination of AWS building blocks and robust ETL to bring their pipelines into the light. As they went to the drawing board, they aimed to reduce overhead, nix time spent on coding, and give all Invoice2go divisions the data tools they needed to meet high-level company objectives.
They also needed an ETL solution tailor-made for AWS and capable of ingesting a diverse range of content, including JSON, Parquet files, Postgres databases, and webhooks. In all, they considered twelve ETL vendors and trialed four. Only Etleap checked all the boxes. With the previous clusters, the transformation code was thousands of lines that took multiple engineers to develop. Etleap condenses this workload into a few clicks, producing pipelines in minutes.
Comprehensive solution brings stability, security, and simplicity
- Amazon DynamoDB tables send a full-fidelity copy of the saved object onto DynamoDB Streams. The source DynamoDB objects include a schema version as a top-level attribute and every major object is assigned a stream. This transition is seamless and without risk of data loss.
- AWS Lambda generically gathers the DynamoDB objects, determines their source table, parses their major version, and then dispatches them to a corresponding Firehose stream. The entire Lambda is less than 80 SLOC of JavaScript, so the possibility of breakage and bugs is minimized, and results in a fraction of the code that had been required with the Spark clusters.
- AWS Kinesis Data Firehose aggregates batches of events and writes them as line-delimited JSON to S3, providing partitioning, compression, and batching.
- Terraform is used to define all AWS resources, and the steps to this pipeline pattern have been modularized so ingesting an entirely new table or major version takes a few minutes of copy-pasting and filling in the blanks.
- Etleap completes the puzzle by extracting, transforming, and loading objects from S3 to Redshift. “We are approaching a couple of hundred pipelines from Postgres, S3 sources, and webhooks coming in, all being stored in AWS Redshift,” explains Kramar. “The work that used to go into building these pipelines manually is phenomenal. With Etleap we simply click and go, setting up a pipeline in a couple of minutes.”
Having Etleap ingest line-delimited JSON is bulletproof. We point to an S3 bucket and path, the Data Wrangler pulls it apart, and we do any final touches to types and that’s it.
Alain Kramar, Senior Software Engineer, Invoice2go
Bigger data, stronger analytics, and a smaller more effective Data Team
In the first six months after Invoice2go shut down their Apache Spark clusters, brought the DynamoDB-Lambda-Firehose environment online, and migrated from Alooma to Etleap, their business has reaped the benefits.
From executives to analysts: data on everyone’s plate
Since the revamp, product management has been able to construct more fluid app flows that lead to higher conversion, user acquisition has spearheaded dynamic and targeted marketing campaigns, and customer research understands their patrons on a more granular level. All of this means Invoice2go can better connect with their ideal client- the small business owner- and stand-out against the competition to remain among the top invoice apps globally. “A recent product manager I onboarded was just in awe of how much data we’ve made available,” asserts Kramar, “and it has given all teams the chance to dig in deep.”
The executive team is just as keen on the benefits. Since the renovation, they’ve noted the seamlessness and stability of the data flow. Whether evaluating subscriber count, payment volume from the previous day, or how the business is performing at a high level, they are assured that the data they’re using to make competitive decisions is as immediate as it is impactful.
Leaner data team means time spent wisely
In today’s competitive labor market, finding available software engineers with the knowledge and flexibility required to turn out exceptional code is a challenge. Thus, maintaining a compact, efficient team is imperative. With a four-member team, Kramar says spending time wisely is critical. Now, Invoice2go’s data team can do complex transformations with Etleap’s Data Wrangler, but in a fraction of the time and cost. “Etleap enables us to be a leaner, but more impactful data team focused on business-related topics, rather than reinventing the wheel every day,” he concludes.
Core mission back in focus
With modern data pipelines in place and robust ETL to match, Invoice2go can harness data like never before. Whether it’s subscriber volume, email events, tracking data, or web clicks, teams have more data than ever with which to make smart decisions and execute profit-generating projects. That means without the threat of bugs and node failures, Invoice2go can focus on its core mission of empowering small businesses with superior payment and invoice options.
Etleap made the migration so easy and allowed us to beat our deadline. It was all completed within a couple of months. They provided continuous support, whether it was troubleshooting Postgres timeouts or ingesting unexpected values, their response was always first class.
Alain Kramar, Senior Software Engineer, Invoice2go