If you want to learn more about Citus on Microsoft Azure, read this post about Hyperscale (Citus) on Azure Database for PostgreSQL.
This year, I was so excited about doing a workshop about optimizing Python & Django apps with Postgres superpowers for the PyCon 2020 conference.
Working with other developers on performance is something I always find amazing. So props to the Python people at Microsoft who encouraged my team to create a workshop on Postgres for PyCon 2020. Thank you to Nina Zakharenko, Dan Taylor, & Crystal Kelch.
Alas, we had to change our plans and find other ways to share the PostgreSQL workshop content that we had prepared. So I created a video on the topic of database performance for Django developers, to help teach you the PostgreSQL tips and tricks that have served me well in optimizing my Django apps. These tips are what I call “Postgres superpowers.”
Being myself a developer, and having worked with Django for years, I know that there are a few traps to avoid in order to have fast apps. In this video, I go through a few performance traps to avoid—as well as tools you can use to identify pain points.
Where to find this PyCon video on Django performance
You can find this video on Optimizing Python & Django performance in three different places:
- Microsoft PyCon 2020 board at aka.ms/pycon2020
- PyCon2020 YouTube channel: Optimizing your Python & Django apps with Postgres superpowers
- On this very blog post, below
On the Microsoft Board for PyCon2020, you will find a ton of useful videos created by my teammates on the Python team here at Microsoft. I found Nina’s talk on debugging especially interesting! And my Postgres teammate Sai Srirampur also created an (awesome) short demo on Building HTAP applications with Python and Postgres—in Azure.
But before you go watch my PyCon video, let me tell you a few things that I think you will find useful when watching.
Tools to help optimize Python
Django and Postgres have tools for you to use in order to see your SQL queries executed. In my experience, looking into executed SQL queries is the best way to make sure you don’t let your ORM trick you into running redundant, unoptimised queries.
A tool that I’ve used a lot working with Django, is the django debug toolbar. When you set
DEBUG=True in your settings, django-debug-toolbar will show up in your template. You can see all the queries executed by your view/template and thanks to the traceback, you can easily find in your code where the queries come from.
The django-debug-toolbar also gives you the execution time and you can have the result of
EXPLAIN on that query, which is great to debug slow queries.
In the workshop, I use django-debug-toolbar for the traceback and timing functionalities.
Postgres logs in your day to day life
So, I use Postgres logs in my day to day life. Postgres logs give you a sure way to see all queries ran on your database.
And beyond Django, if you are working on different apps, with maybe different languages, the great thing about logs is that you can switch environments, whether it’s Django, flask, or a Golang API. Your Postgres logs will still be available once configured.
To see how I configure my local Postgres to show all queries, you can watch my PyCon workshop.
pg_stat_statements, the most useful Postgres extension (right?)
And finally, the tool I introduce is pg_stat_statements. It’s a Postgres extension for tracking execution statistics of all SQL statements executed by a server. You can see how many times a SQL query was executed—as well as the minimum, maximum, and mean execution times…
pg_stat_statements is a great tool to find slow queries on your production systems, and in your local Postgres environments too.
Avoiding loops in Django
As I mentioned, if you don’t pay attention to your Postgres logs, you can quickly end up with unexpected queries. One very common issue is loops.
In the workshop, I go through the following example:
I have a view with a queryset of
Campaigns filtered on a company
And in the template, I have a loop, there is
for loop displaying links to the detail view of related objects . This loops triggers hundreds of queries… Which is not something that you want! As a good friend once told me “There must be something wrong in your view if you have more than 20-25 queries.”
And to avoid having O(n) queries because of a loop, Django has a few easy solutions:
Limiting columns to avoid slow SQL queries
One thing that I’m not fond of in ORM is the way it does
SELECT. By default, the ORM not knowing what column you will need, it selects all columns. When your table has many columns, some being potentially long texts, selecting unnecessary columns means pulling more data—which leads to slower queries. Not what you want!
In the workshop, we talk about how to limit columns using Django’s
Pagination tips for Postgres
Finally, pagination can be very useful to avoid pulling thousands of rows (unnecessarily!) from your Postgres database. You have different ways to do pagination. In my PyCon 2020 video about Postgres superpowers, I give you two ways to paginate.
The first one is built-in in Django. It’s based on adding
OFFSET ... LIMIT to queries.
Unfortunately the offset approach to pagination has a couple limitations for Django apps.
- First, Django uses a
COUNT (*)to have the number of pages. The goal being to be able to jump between pages numbers.
COUNTis never good, and the more you can avoid, the better.
OFFSET ... LIMIT ...gets slower in high pages.
OFFSETis used to skip first N rows of a query—to do that, database must fetch rows & order, to return last ones.
A better way to paginate is using keyset pagination. Instead of using
OFFSET ... LIMIT ..., keyset pagination is based on filters on the
ORDER BY fields. I explain more about how to implement keyset pagination in my PyCon video (and I also coverthe performance benefits you’ll see by using keyset pagination instead of
Brand new Discord channel for PyCon
To try to connect to all the many Python developers we would have met in person at PyCon this year, our team has also created a
microsoft-python discord channel. No surprise that the server follows our Open Source Code of Conduct. If you want to join (I’m on the discord), the new microsoft-python discord can be joined via aka.ms/python-discord.
Wishing you performant Django apps that use Postgres at its best
Just like chocolate and peanut butter go better together, so do Python and Postgres. I hope that the Postgres optimization techniques (the “superpowers”) that I walk through in my PyCon 2020 video (below) will help you improve the performance of your Python and Django apps.
Somewhat irrelevant side-note: my PyCon video features hand-painted fish that I created many (many, many) years ago, long before I became a Django and Postgres developer. Enjoy.