Testing Postgres vectorization for faster aggregations

Written by Umur Cubukcu
October 7, 2014

One of the ideas we wanted to explore more has been speeding up in-memory aggregations in PostgreSQL through vectorized execution. The opportunity to do so came up when we had our intern, Can, take this on as a project during his summer internship. The early numbers he has there are promising - suggesting a 3-4x increase in PostgreSQL performance for simple SELECT queries with sum/count/group by operations.

This is proof-of-concept work conducted within several weeks and not with the production-ready diligence that we always have on our projects - hence we made a very explicit point to add "_test" to the end of the project name. That said, it shows good promise for further increases in performance, and given the ideas there can be useful more broadly, we are happy to open source the project.

Please take a look at postgres_vectorization_test on GitHub, and let us know what you think! The readme also has more details on our approach, sample queries, performance comparisons, and instructions on getting it set up as a PostgreSQL extension: https://github.com/citusdata/postgres_vectorization_test

With this, my special thanks go to our intern, Can, for tackling this in the short amount of time he had, to Metin for mentoring Can, and of course, to Ozgun, for pulling everything together for an exciting summer project.

Umur Cubukcu

Written by Umur Cubukcu

Led the Postgres product team at Microsoft. Co-founder & CEO of Citus Data. Speaker at Strata Data Conference, Ignite, & Microsoft Virtual Open Source Summit. M.S. from Stanford. New dad.

@umurc