Hyperscale (Citus) is now GA as a built-in deployment option in Azure Database for PostgreSQL. Read Craig’s blog
Postgres has been a great database for decades now, and has really come into its own in the last ten years. Databases more broadly have also gotten their own set of attention as well. First we had NoSQL which started more on document databases and key/value stores, then there was NewSQL which expanding things to distributed, graph databases, and all of these models from documents to distributed to relational were not mutually exclusive. Postgres itself went from simply a relational database (which already had geospatial capabilities) to a multi modal database by adding support for JSONB.
But to me the most exciting part about Postgres isn’t how it continues to advance itself, rather it is how Postgres has shifted itself from simply a relational database to more of a data platform. The largest driver for this shift to being a data platform is Postgres extensions. Postgres extensions in simplified terms are lower level APIs that exist within Postgres that allow to change or extend it’s functionality. These extension hooks allow Postgres to be adapted for new use cases without requiring upstream changes to the core database. This is a win in two ways:
- The Postgres core can continue to move at a very safe and stable pace, ensuring a solid foundation and not risking your data.
- Extensions themselves can move quickly to explore new areas without the same review process or release cycle allowing them to be agile in how they evolve.
Okay, plug-ins and frameworks aren’t new when it comes to software, what is so great about extensions for Postgres? Well they may not be new to software, but they’re not new to Postgres either. Postgres has had extensions as long as I can remember. In Postgres 9.1 we saw a new sytax to make it easy to
CREATE EXTENSION and since that time the ecosystem around them has grown. We have a full directory of extensions at PGXN. Older forks such as which were based on older versions are actively working on catching up to a modern release to presumably become a pure extension. By being a pure extension you’re able to stay current with Postgres versions without heavy rebasing for each new release. Now the things you can do with extensions is as powerful as ever, so much so that Citus’ distributed database support is built on top of this extension framework.
Extensions in the real world
Extensions generally fall into a few different broad categories, though in reality there are not hard boundaries on what an extension can and cannot be:
- Custom data types
- New use cases/functionality
- Foreign data wrappers
Need a new datatype, there is an extension for it
Postgres is very open when it comes to adding support for new data types. We actually saw document support in Postgres way, way, way back with an XML data type. Then we saw hstore in the form an extension which was a key-value store directly in Postgres. Now we have a native datatype in JSONB which includes rich JSON document support so extensions aren’t needed for that case, but there are still others where they can be extremely useful. UUID itself is a native data, but you can have a number of functions to help with generation of them directly within the database by using
Let’s not stop with the basic ones, how crazy can we get? How about HyperLogLog. HyperLogLog is based on a research paper from Google. It implements K-minimum value, bit observable patterns, stochastic averaging, and harmonic averaging. If you’re like me and had to google pretty much every one of those things the simple way to explain it: HyperLogLog is great for approximate distincts which can be stored in a really small disk space, then composed over time to find intersections/unions of uniques across various buckets. If you’re building a web analytics tool or an ad network HyperLogLog may end up being your best friend. You’ve also got more common approximation datatypes like TopN which is great for building leaderboards.
Who doesn’t want more insights?
As an application developer, understanding what is going on with your database can be painful. Postgres already has a number of tools such as
EXPLAIN which will give you insights about query plans, but extensions can give you an extra leg up. One of the most valuable extensions is a monitoring extension that allows you to quickly get insights into how often a query was run and how long it took in aggregate and on average:
pg_stat_statements. Pgstatstatements essentially parameterizes queries so your
where conditions are removed so you can know which ones make most sense to optimize with indexes.
Speaking of indexes, one option is to go and add an index ot everything, but you may want to be a little more methodical than just indexing everything. HypoPG is an extension that will give you insights into what performance would look like if you hypothetically added indexes. Yes, it can tell you all on it’s own how to index your database.
Postgres: A relational database but also not
Datatypes and monitoring/insights both make Postgres better, but at the end of the day with them it’s still a relational database. Where extensions start to get really fun is when they change what Postgres is capable of. PostGIS is one of the larger and older extensions which turns Postgres into the worlds most advanced open-source geospatial database. PostGIS comes with new datatypes as well as operators. The short takeaway is if you want to do anything geospatial it can help.
Maybe geospatial isn’t your thing, instead you’re dealing time series log data. If you need easy partitioning of time series data so you can archive old data or you only query a recent subset then pg_partman has your back. It enhances the already built-in time series partitioning that arrived recently in Postgres to allow you to automatically create new partitions as well as remove old ones.
Then there is scaling out. Postgres is great, but what happens when you outgrow the limits of what a single node can do? Citus transforms Postgres into a distributed horizontally scalable database. Under the covers it’s sharding your data, and then has multiple executors to route queries accordingly. Meanwhile to your application your entire distributed setup appears as a single node Postgres database. You get all the benefits of sharding without the work.
This class of extensions truly take Postgres into new territory. Whether it’s becoming a geospatial, time-series, or distributed database they move Postgres out of it’s traditional relational realm.
What if you have data not in Postgres?
Okay I know what you’re thinking. Why wouldn’t you have data in Postgres? But somewhere along the line, before you came along, someone threw data into Redis or Mongo or MySQL or really just about any other database. Foreign data wrappers are a unique class of extension all on their own that allow you to connect from within Postgres directly to some other data source. Foreign data wrappers are extremely useful when you’re working to join some disparate data source with your system of record Postgres data. It can save a lot of time over elaborate ETL jobs and get the job done just as well. A word of caution against using them in a customer facing production work flow though as they’re often not the most performant mechanism
If you can dream it Postgres extensions can do it
Postgres continues to get better and better with each new release. The support for new index types, consistently improving performance, and in general building a richer feature set has resulted in a great database. But the ecosystem of extensions around it make it into something truly unique unlike any other database. If there is a secret weapon to Postgres success it’s not Postgres itself it’s how extensions make it even more.