AJ Welch of Chartio spoke to us recently about the PGConf Silicon Valley PostgreSQL conference which is November 17-18 at the South San Francisco Conference Center.
AJ, a Data Engineer at the business intelligence and analytics company Chartio, will be speaking about “hacking” Postgres in a session titled “Using the PostgreSQL Extension Ecosystem to Perform Advanced Analytics.” At Chartio he helps customers get the most out of their data pipelines, warehouses and analyses. Prior to working at Chartio he built financial data warehouses for large Oracle shops.
Terry: AJ, you’ve mentioned in past conversations that you use PostgreSQL daily and are particularly excited about the growing ecosystem of open source extensions. How does this extensible architecture compare with NoSQL? And also to MySQL?
AJ: At Chartio, Postgres is one of the most popular databases among our customers so I have the good fortune of working with it almost every day. However, we have many customers and prospects using other systems and I’ve noticed they seem to have this perception that relational databases are only good at descriptive statistics (count, sum, avg, etc.) on medium sized structured data sets. Once their needs expand to inferential, predictive or causal analysis on larger or unstructured data sets, they start getting curious about Hadoop and NoSQL.
With something like MySQL this may in fact be the case. Admittedly, I am not an expert on MySQL but what we tend to see with our customers is that they struggle to perform even intermediate analyses because it lacks some of the analytical features available in Postgres (CTEs, window functions, set returning functions, rich data types). That’s not to say the analyses aren’t possible, you just have to jump through hoops to get there. The MySQL Plugin API has been around for a while now and could potentially provide more advanced analytical capabilities but to date it is more focused on storage engines, system tables, full text search, authentication, auditing, etc.
With the Hadoop and NoSQL ecosystems, it’s less a question of extensibility and more a question of maturity and skill sets. Hadoop in particular has a very vibrant ecosystem when it comes to advanced analytical software yet the data storage and processing market is still dominated, both in market size and available skills, by relational databases. So many business are taking an unnecessary risk by using one of those systems for functionality that they can get out of a modern RDBMS such as Postgres with all of its extensions such as MADlib, postgresql-hll, pg_shard and those on pgxn.org.
Terry: Your talk details using the extension ecosystem of Postgres to perform advanced analytics. Could you share an example or two of actual use cases – along with what’s involved in terms of hardware, systems, etc.?
AJ: My favorite examples are when folks get a quick win where they otherwise thought they had their work cut out for them. This often happens with foreign data wrappers. Data integration is such a tough task for organizations today and traditional ETL projects can be a large investment both in terms of time and money. When people find out they can use Chartio to query Redis, Mongo, etc. via Postgres it tends to open their eyes.
MADlib is another great example of an extension that opens people’s eyes to the possibilities afforded by Postgres’ extension system. The ability to perform in-database statistics and machine learning gives organizations the ability to expose this functionality to a broader audience through tools such as Chartio instead of requiring the data to be sampled and dumped into isolated tools such as R or Python.
Terry: Finally, tell me a bit more about Chartio and who uses it.
AJ: Chartio is a modern cloud business intelligence tool. It can connect to a variety of relational databases, Postgres and its derivatives being among the most popular, as well as services such as Google Analytics, Salesforce, Twilio, etc. We focus on making it easy for non-technical users to build charts and dashboards and share these throughout their organization but we don’t get in the way of analysts who want to dive in with SQL and use all of the advanced features of their database (extensions!). Organizations such as Optimizely, WeWork, Blackboard and Lumosity use us to make critical business decisions everyday. If you’re interested in learning more feel free to reach out to me.
Terry: Thank you for speaking with us, AJ.
Registration is open for PGConf Silicon Valley now. Use discount code CitusData20 for a 20% savings off the current prices. We hope to see you at PGConf SV!