Heap automatically captures every user action on a company’s web or iOS application – clicks, taps, swipes, form submissions, page views, etc. – for over 4,000 customers and lets their analysts immediately query all of that information. Other similar solutions are limited to tracking a handful of activities to avoid compromising performance. As a result, when questions arise about untracked activities, an engineer must first start logging the new activity, resulting in additional costs and lengthy delays while the data is being collected.
Heap automatically captures all the data, all the time. The challenge with this approach is that Heap is collecting up to 100 times as much data as other solutions, resulting in a far more complex data infrastructure. For the first generation of Heap, the company used Amazon Redshift. The solution worked well but the Redshift data model was not optimal for the types of analyses that Heap wanted to perform. As a result, Heap could not scale its solution to accommodate the increasing number of websites and applications that Heap anticipated onboarding, and going after larger websites was out of the question.
Heap looked at a number of approaches to rewriting its solution and chose Citus because of its Postgres lineage and the ability to easily scale a cluster of commodity servers. Dan Robinson, Lead Engineer at Heap, was brought in to scale the company's infrastructure, and his first project was to move the product onto a Citus backend. Today, all of Heap’s data for all of its customers lives in a Citus cluster, which also powers all of Heap’s analytics.
“We are essentially a database company built around a Citus cluster. Without the right database in place, we would likely have failed,” said Robinson. “With Citus, we are able to rapidly scale our business, and the response of our customers has been fantastic.”
"Thanks to Citus, we’re powering a product that is really kind of magical for a lot of customers. The speed and performance of our database make it possible for them to immediately perform truly advanced analytics on any user activities they want to explore. All the data is already there."Dan Robinson, Lead Engineer at Heap.
Heap and Citus DB
Advanced Analytics on Huge Amounts of Data
Heap automatically captures every activity of every visitor and user of a customer’s website or iOS application so their customers can easily and immediately perform queries on the data. Citus has enabled Heap to offer customers very advanced query capabilities. For example, an analyst could define a cohort as “users who have uploaded a photo and have logged in three times in the last week and accepted a friend request in the last month,” and then filter a conversion funnel for people in that cohort.
Both existing customers and prospects are responding to these capabilities with tremendous excitement. “Thanks to Citus, we’re powering a product that is really kind of magical for a lot of customers,” said Robinson. “The speed and performance of our database make it possible for them to immediately perform truly advanced analytics on any user activities they want to explore. All the data is already there.”
Cost-effective Scaling and Analytics
Citus has enabled Heap to cost-effectively scale its database horizontally across a cluster, which has now grown to 30 nodes that share approximately 3.6 TB of memory. Each commodity server has 16 CPU cores and 3.2 TB solid state drives (SSDs). Heap now has approximately 75 TB of data on disk, upwards of 100 billion events. Citus has also enabled Heap to maintain a mostly relational query model, providing all the benefits Heap needs from a traditional database without the single-node restriction, while also facilitating lightning fast analytics.
Working with Citus Data
In addition to using the core Citus database, Heap worked with the Citus team to have custom extensions written related to funnel computations and behavior and retention analysis. “From the start, Citus has been extremely responsive to our needs,” said Robinson. “Our custom extensions have unlocked very powerful new features for our customers. It may have been possible to do some of the same analysis without these extensions, but it would definitely have been a lot slower.”
Because Citus is based on PostgreSQL, Heap was able to leverage its extensive knowledge of PostgreSQL and the Postgres ecosystem. Citus also enabled Heap to use a data model that would support some of the more advanced analytics that the company wanted to make available to customers. “Being on a fully relational data model with the full indexing and querying power of Postgres was extremely valuable when it came to adding new analytics capabilities. Yet Citus also enabled us to maintain very fast response times, even with extremely complex queries,” said Robinson.