Citus 11.3 is out! Now with tenant statistics. Read all about it in Marco's 11.3 blog post. 💥
Written by Claire Giordano
September 26, 2019
Update in October 2022: Citus has a new home on Azure! The Citus database is now available as a managed service in the cloud as Azure Cosmos DB for PostgreSQL. Azure documentation links have been updated throughout the post, to point to the new Azure docs.
When Django developer and Azure Postgres* engineer Louise Grandjonc confirmed that she could sit down with me for an interview in the days leading up to DjangoCon 2019, I jumped at the chance. Those of you who were in the room for Louise’s talk this week probably understand why. Louise explains technical topics in a way that makes sense—and she often uses unusual (and fun) examples, from crocodiles to owls, from Harry Potter to Taylor Swift.
And since I experience a bit of FOMO whenever I miss a fun developer conference like DjangoCon, I especially wanted to learn more about Louise’s DjangoCon talk: Postgres Index Types and where to find them.
Here’s an edited transcript of my interview with Louise Grandjonc of Microsoft (@louisemeta on Twitter.)
* Editor’s Note: Azure Database for PostgreSQL is a fully-managed database as a service. Now including Hyperscale (Citus) as a built-in deployment option that enables you to scale out Postgres horizontally on Azure.
The 1st thing: I want you to understand that Postgres indexes are useful for 2 reasons. Performance, and Constraints.
The 2nd thing: I hope people walk away with a clear overview of all the different options you get when you create an index—including partial indexes, unique indexes (that force a constraint), multi-column indexes, as well as standard (no condition) indexes. I want to help Django developers understand when to pick which one, and when the different index options can help you.
Third, I want you to understand the different types of most popular Postgres indexes and what differentiates them, including balanced trees (BTrees), GIN, GIST, and BRIN indexes.
Oh, and one more thing about my talk…. I always try to make sure that the slides of my talks are useful by themselves, even if you missed the talk. Or in case you get distracted during my talk, so it’s easy to catch up when you start listening again! Let me know if I succeeded. (Link to Louise’s DjangoCon 2019 slides.)
And I add in the crocodile drawings because, well, they are adorable and they help to keep my audience engaged. When I was naming my first talk on Postgres indexes, I had just read this cute story about birds eating food out of crocodile’s teeth and it also helps the crocodile. Whenever I give a very technical talk, I like to have a story, an example that attendees can follow.
Well, if you have never learned anything about databases, it’s true that getting started with a database can be complex… and of course people don’t want to break anything. It’s easy to be scared of breaking things.
So first I empathize! I tell people that it’s already so much work to learn Python, and Django—so it can be overwhelming to have to learn SQL and databases, too.
And yet—if you’re curious, and interested in understanding more about Postgres, then you’ll be able to learn how to leverage Postgres and your ORM… And there are some great resources to help you.
There is a useful website called pgexercises.com. It’s a good resource for learning SQL—in a PostgreSQL environment. While it’s true that your Django ORM covers 90% of what Postgres can do in terms of SQL, you might not know about the other 10%. So many people don’t know what things Postgres can do that their ORM can’t. It’s so hard to know what you don’t know.
So the pgexercises website is a good way to learn more about what PostgreSQL can do.
Markus Winand wrote a book, SQL Performance Explained. I recommend it.
“SQL Performance Explained” was one of the first books I read about Postgres. It was the first year after I graduated from university, after I’d been working with Postgres for a while and once I became more curious. Markus’s book helped me understand so so much about the Postgres database and what I was doing wrong. As well as what I was doing right.
It’s a good book for Django developers.
Also, it’s a short book. That’s a very important thing. Some books are so huge, you just know you’re not going to read the entire thing. This is especially true with technology books.
Conferences! I have learned so much from attending conferences and having people share their knowledge with me. I really love how in the open source community—like Python, Django or Postgres—everyone seems eager to share their learnings.
Yes. Django includes the Django debug toolbar django-debug-toolbar. It does so many things—how about I focus on what the debug toolbar does for SQL and for SQL queries?
When you’re creating a view in Django, the Python code will retrieve the data that will be displayed in your template. The view will send the data and the page template will display the data. When you open your new endpoint (something like
users/louisegrandjonc), the endpoint will display whatever you want, and the
django-debug-toolbar will ADD something to that page, which will contain all the queries that have been executed when loading the page (as well as the EXPLAIN plan of the queries ). It also gives you the code line that generated the query, which is amazing to debug a loop that would generate unfortunate queries for example.
My advice: as long as you’re working on a local database, on your own laptop—not in production—I tell developers to play with the database. What I mean is: try to change the configuration of your database, for example to be able to have logs. (Logs are so useful.
django-debug-toolbar is a good tool, but it doesn’t do everything, and logs definitely augment
django-debug-toolbar.) And also try to run SQL queries, write them, test them. Learning on a dataset that you know, on a project that you love will make it so much easier to assimilate new things!
When you’re working with a local database, when you’re not scared of screwing up your data, you can more easily test things to validate your understanding and learn how things work, experientially.
For example, I recommend you learn to use pg_stat_statements in Postgres—both in production, and also in testing—to find very slow queries.
The first thing I have to say is that the Django community is really wonderful. Welcoming for newcomers. And diverse. The Django community has been working on diversity & inclusion for a long time. In fact, the first conference I ever presented at was DjangoCon Europe. I got the courage to submit because I knew it would be a good crowd to talk in front of, I knew the audience would be positive, supportive, open-minded, even though I was sure I would be a little bit stressed when the time came to give the talk.
Oh, and the Django documentation is really good: it’s easy to find the answers that you need. And the getting started documentation is also good for, well, for getting started. But I still use it when I forget things like, “what’s the command to create a new app again?”
The Django community is quite active online, for example on the mailing list, or on Stack Overflow where you’re likely to find answers to some of your Django questions, too.
Django supports a lot of data types that are specific to Postgres, such as JSONB, tsrange, arrays, hstore, and more. Plus Django supports Window functions, subqueries, aggregations… as well as the full range of indexes that you get out of the box with Postgres. And Django supports Postgres’s full text search feature—also PostGIS which is another popular Postgres extension used for geospatial.
And a lot of these features come from contributions from the community. Some started as applications that then were integrated in core. As part of the Postgres team at Microsoft, in addition to working on Azure Database for PostgreSQL, one of my Citus open source projects is a Django library called django-multitenant.
Django-multitenant makes it easy for developers building multi-tenant apps to build their Django app on top of Citus, which is an extension to Postgres that scales out Postgres horizontally. That is just one example of a open source contribution from my own database team at Citus—there are so many valuable community contributions from others, too!
Someone recently told me that one of the reasons Django is so good with Postgres is because the Django creators were Postgres users. I didn’t know that. I’m told that they—the Django creators—built good compatibility with Postgres from scratch, from the beginning. And then of course the community contributions started.
Django is constantly evolving as developers contribute and add to it—so Django support for Postgres just keeps getting better.
One reason to know SQL and to learn a bit about Postgres is to optimize performance. So you can answer questions like: why is that query slow? If you don’t understand the query, you can’t really debug it, find the missing index, or a better way to write it.
The 2nd reason: even if Django has gone pretty far into what it supports for Postgres, there are still some features that are not supported by the Django ORM yet (such as GROUPING SETS, LATERAL JOIN, Common Table Expression—often called CTEs—and Grouping sets. Maybe you don’t need those features, or maybe you just don’t know about them. And from my experience, when I discover something new, I realize that I needed it all along and so I start using it!
The Django ORM is great and works really well with Postgres. And it’s also great to know that you can use extra features that will help you build a performant and beautiful app!
Thank you. I’ve been working with Django the last 6 years and just love it. And I love the Postgres features that have rolled out in Django. It’s a very cool framework—and I’m so excited to be able to give a talk at DjangoCon on Postgres indexes this year.
This was my fave talk at #DjangoCon of all the great talks I saw! I got ideas I want to implement immediately.— Erin Mullaney (@erinrachel) September 25, 2019
Can’t wait for the slides from @louisemeta. Excellent info on very specific indexes to speed up db calls in Django 2.2+. ⏩⏩⏩ https://t.co/MSTaBvUoCi