POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Louise Grandjonc
May 20, 2020
This year, I was so excited about doing a workshop about optimizing Python & Django apps with Postgres superpowers for the PyCon 2020 conference.
Working with other developers on performance is something I always find amazing. So props to the Python people at Microsoft who encouraged my team to create a workshop on Postgres for PyCon 2020. Thank you to Nina Zakharenko, Dan Taylor, & Crystal Kelch.
Alas, we had to change our plans and find other ways to share the PostgreSQL workshop content that we had prepared. So I created a video on the topic of database performance for Django developers, to help teach you the PostgreSQL tips and tricks that have served me well in optimizing my Django apps. These tips are what I call "Postgres superpowers."
Being myself a developer, and having worked with Django for years, I know that there are a few traps to avoid in order to have fast apps. In this video, I go through a few performance traps to avoid—as well as tools you can use to identify pain points.
You can find this video on Optimizing Python & Django performance in three different places:
On the Microsoft Board for PyCon2020, you will find a ton of useful videos created by my teammates on the Python team here at Microsoft. I found Nina’s talk on debugging especially interesting! And my Postgres teammate Sai Srirampur also created an (awesome) short demo on Building HTAP applications with Python and Postgres—in Azure.
But before you go watch my PyCon video, let me tell you a few things that I think you will find useful when watching.
Django and Postgres have tools for you to use in order to see your SQL queries executed. In my experience, looking into executed SQL queries is the best way to make sure you don't let your ORM trick you into running redundant, unoptimised queries.
A tool that I've used a lot working with Django, is the django debug toolbar. When you set DEBUG=True
in your settings, django-debug-toolbar will show up in your template. You can see all the queries executed by your view/template and thanks to the traceback, you can easily find in your code where the queries come from.
The django-debug-toolbar also gives you the execution time and you can have the result of EXPLAIN
on that query, which is great to debug slow queries.
In the workshop, I use django-debug-toolbar for the traceback and timing functionalities.
For the workshop, we created a very simple app, with not much javascript happening, so there was not much surprise when it comes to queries. But if you have API calls in your template, the django debug toolbar won't show queries coming from those calls. Which is why django debug toolbar is nice, but not always adapted to every app.
So, I use Postgres logs in my day to day life. Postgres logs give you a sure way to see all queries ran on your database.
And beyond Django, if you are working on different apps, with maybe different languages, the great thing about logs is that you can switch environments, whether it's Django, flask, or a Golang API. Your Postgres logs will still be available once configured.
To see how I configure my local Postgres to show all queries, you can watch my PyCon workshop.
And finally, the tool I introduce is pg_stat_statements. It's a Postgres extension for tracking execution statistics of all SQL statements executed by a server. You can see how many times a SQL query was executed—as well as the minimum, maximum, and mean execution times...
pg_stat_statements is a great tool to find slow queries on your production systems, and in your local Postgres environments too.
As I mentioned, if you don't pay attention to your Postgres logs, you can quickly end up with unexpected queries. One very common issue is loops.
In the workshop, I go through the following example:
I have a view with a queryset of Campaigns
filtered on a company
queryset.filter(company=self.request.user.employee.company)
And in the template, I have a loop, there is for
loop displaying links to the detail view of related objects . This loops triggers hundreds of queries... Which is not something that you want! As a good friend once told me "There must be something wrong in your view if you have more than 20-25 queries."
And to avoid having O(n) queries because of a loop, Django has a few easy solutions: select_related
and prefetch_related
One thing that I'm not fond of in ORM is the way it does SELECT
. By default, the ORM not knowing what column you will need, it selects all columns. When your table has many columns, some being potentially long texts, selecting unnecessary columns means pulling more data—which leads to slower queries. Not what you want!
In the workshop, we talk about how to limit columns using Django's only
function.
Finally, pagination can be very useful to avoid pulling thousands of rows (unnecessarily!) from your Postgres database. You have different ways to do pagination. In my PyCon 2020 video about Postgres superpowers, I give you two ways to paginate.
The first one is built-in in Django. It's based on adding OFFSET ... LIMIT
to queries.
Unfortunately the offset approach to pagination has a couple limitations for Django apps.
COUNT (*)
to have the number of pages. The goal being to be able to jump between pages numbers. COUNT
is never good, and the more you can avoid, the better.OFFSET ... LIMIT ...
gets slower in high pages. OFFSET
is used to skip first N rows of a query—to do that, database must fetch rows & order, to return last ones.A better way to paginate is using keyset pagination. Instead of using OFFSET ... LIMIT ...
, keyset pagination is based on filters on the ORDER BY
fields. I explain more about how to implement keyset pagination in my PyCon video (and I also coverthe performance benefits you’ll see by using keyset pagination instead of OFFSET
.)
To try to connect to all the many Python developers we would have met in person at PyCon this year, our team has also created a microsoft-python
discord channel. No surprise that the server follows our Open Source Code of Conduct. If you want to join (I’m on the discord), the new microsoft-python discord can be joined via aka.ms/python-discord.
Just like chocolate and peanut butter go better together, so do Python and Postgres. I hope that the Postgres optimization techniques (the "superpowers") that I walk through in my PyCon 2020 video (below) will help you improve the performance of your Python and Django apps.
Somewhat irrelevant side-note: my PyCon video features hand-painted fish that I created many (many, many) years ago, long before I became a Django and Postgres developer. Enjoy.