This is a past event. See new event
Schedule with video links
Citus Con: An Event for Postgres 2023
Virtual sessions at-a-glance
Each livestream has a different keynote with ~6 unique sessions, so you may want to join both livestreams. And the ~25 on-demand sessions are inspiring too. All talks are in English and will have captioning. Be sure to click the “add to calendar” buttons.
Join the #cituscon channel on Discord to join the conversation.
Explore on-demand talksAmericas live sessions: | Apr 18 | 9a–12:30p PDT |
---|---|
EMEA live sessions: | Apr 19 | 9a–12:30p CEST |
On-demand sessions: |
Schedule
- Live sessions Americas Live Sessions Watch livestream replay
- KEYNOTE: Big Opportunities in Small DataSimon Willison
Not all data needs to be big.
Civic data is more abundant than ever, with local and national governments around the world publishing rich data...
Not all data needs to be big.
Civic data is more abundant than ever, with local and national governments around the world publishing rich data to open data portals. Every organization has untapped data about their business, and every individual has untapped data about their personal activity.
This data is measured in megabytes, not terabytes. What's missing are the tools that help people explore and understand data at this smaller scale: too big for Excel, but not so big that it demands a Big Data warehouse.
I've spent the last five years exploring the world of small data with Datasette, my open source multi-tool for exploring, analyzing and publishing data.
Datasette is built on SQLite. Why SQLite? It's tiny, fast and ubiquitous - and supports a workflow where databases can be created, shared and even discarded with ease.
Through the lens of Datasette and SQLite, I'll explore this problem space and ask how the PostgreSQL ecosystem can evolve to best address the fascinating opportunities presented by Small Data.
Simon WillisonCreator of Datasette.Co-creator of DjangoSimon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He recently completed a JSK journalism fellowship at Stanford, during which he focused on building open source tools for journalism based on his experience working as a data journalist at the UK's Guardian newspaper.
Prior to the fellowship, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010.
He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at https://simonwillison.net/
- Postgres without SQL: Natural language queries using GPT-3 & RustJelte Fennema
Generative AI has gotten very popular in the last year. In this talk I’ll show a Postgres extension that I wrote, which allows you to use the power...
Generative AI has gotten very popular in the last year. In this talk I’ll show a Postgres extension that I wrote, which allows you to use the power of GPT-3 right from your database. This new extension makes it easy to optimize your database schema, query your data, and even distribute your Postgres tables using the open source Citus database extension. All of this by using normal human language, and without the need to know any SQL. So, now you can finally jump on the NoSQL bandwagon, while still using good old trusty Postgres .
In this talk you’ll learn:
- How to write Postgres extensions using Rust (no Rust experience required)
- How generative AI in your database can make your own life easier
- What dangers to watch out for when letting AI loose on your Postgres database
Jelte FennemaSenior Software Engineer @ MicrosoftCurrently I'm working on Citus, Postgres and Pgbouncer at Microsoft. Before that I was a big time Postgres user at Stream, where I worked on low latency APIs for chat and social timelines. I'm one of the current maintainers of the PgBouncer project. I studied at the University of Amsterdam where I got my BSc in Computer Science and MSc in System and Network Engineering.
- Citus and JSON for real-time analytics at Vizor GamesIvan Vyazmitinov
At Vizor Games, we rely completely on the open-source Citus extension to Postgres, since it combines the widely adopted, feature rich and mature...
At Vizor Games, we rely completely on the open-source Citus extension to Postgres, since it combines the widely adopted, feature rich and mature PostgreSQL database with the possibility to scale indefinitely.
In this Citus and PostgreSQL user talk I will discuss:
- Deploying the Citus database cluster on bare metal servers with Gentoo and Btrfs
- Implementation of ETL with big amount of raw data (about 100GB per day)
- Building analytics on semi-structured JSONB data and creation of analytical layer of views over it. (Side-note: Postgres support of JSONBs is a key foundation to our analytics)
- Administering Citus on 20+ databases within one cluster
- Interaction with direct cluster users, including analytics and data scientists
- Integration with BI tools, like Tableau and Metabase
Ivan VyazmitinovVizor Games, Internal Tools Tech LeadI am a Java developer with 5+ years of experience and an accidental DBA. Starting from 2018 in Vizor Games I've gradually accepted role of tech lead and gained experience with Citus during development of internal analytics system.
- Deploying PostgreSQL to Azure with BicepPamela Fox
The best kind of deploy is a repeatable deploy; one that you can redo and know that your infrastructure will be configured exactly the way you like...
The best kind of deploy is a repeatable deploy; one that you can redo and know that your infrastructure will be configured exactly the way you like it. For Azure deployments, the Bicep language enables you to programmatically describe your Azure infrastructure and deploy a whole web app stack with a single command.
In this talk, I'll explain the basics of using Bicep to deploy an app, with a focus on configuring Flexible Servers in the Azure Database for PostgreSQL managed service. I'll show firewall configuration, secret generation and Key Vault storage, and virtual network setup. Together, we'll experience the joy of repeatable PostgreSQL deploys on Azure!
Pamela FoxCloud Advocate in Python, MicrosoftPamela Fox is a human that loves to learn, teach, and create. She's currently a Cloud Advocate in Python at Microsoft, where she helps developers use Python with the many Azure offerings.
On the teaching front, Pamela has taught computer science at UC Berkeley and volunteered in bay area classrooms as part of the TEALS, GirlsWhoCode, and CoderDojo organizations. She also started the SF chapter of GirlDevelopIt, where she taught dozens of web development workshops.
Pamela's been in the tech industry for 15 years now, starting with her first role at Google as one of their first developer advocates. She went on to be an early full-stack engineer at Coursera and spent many years after at Khan Academy, both as an engineer and the creator of the computer programming content.
- Additional IO Observability in Postgres with pg_stat_ioMelanie Plageman
pg_stat_io, a new cumulative statistics view in Postgres, provides additional visibility into IO activity split out by backend type, IO context,...
pg_stat_io, a new cumulative statistics view in Postgres, provides additional visibility into IO activity split out by backend type, IO context, and IO operation.
Previously, IO statistics in Postgres, both those built-in and those available through extensions, did not divide IO activity at a sufficient level of granularity to inform tuning decisions. pg_stat_io addresses these gaps.
Using pared-down walk-throughs of the internal Postgres systems responsible for accessing and persisting your data, this talk will explain the causes of common IO bottlenecks. Then, through systematic breakdowns of potential symptoms visible in pg_stat_io, it will explore the most likely misconfigurations leading to these issues.
Melanie PlagemanSenior Software Engineer at MicrosoftMelanie is a Postgres hacker working at Microsoft. She has worked on the Postgres executor, planner, storage, and statistics subsystems. Most recently she has been hacking on the proposed asynchronous and direct IO patch set. She is passionate about writing maintainable code and about building developer tools.
- On compression of everything in PostgresAndrey Borodin
For many years Postgres had only a pglz algorithm to compress TOASTs and full page images in WAL. But recently things have started to change!
In...
For many years Postgres had only a pglz algorithm to compress TOASTs and full page images in WAL. But recently things have started to change!
In this talk, I'm going to uncover the impact of Postgres features already committed - lz4 everywhere. But what's more important are the prospects of compression application that are yet to come: protocol compression, temp files compression, WAL compression, and data segments compression.
One of the most interesting, required components is the so-called random access compressed file. In some cases, we need even random write compressed files. And there are so many approaches (and working implementations!) to do this. Having this component in the Postgres core would allow many very cool things. I think that this component could increase OLTP performance by a very significant multiplier on IO bottlenecked installations.
Andrey BorodinPostgres ContributorHacking on Postgres since 2016. Associated professor at Yandex School for Data Analysis and Ural Federal University.
- Post-Americas Livestream Wrap UpClaire Giordano Robert Treat
Wrap-up to the Americas Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Robert Treat. Includes...
Wrap-up to the Americas Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Robert Treat. Includes impressions of the overall event from the virtual hallway track on Discord, to the swag, and links to the other livestream in EMEA—plus highlight video reels of the 25 on-demand talks, too.
Claire GiordanoCitus & Postgres Open Source Champion @ MicrosoftRobert TreatI make Postgres less painful
EMEA Live Sessions Watch livestream replay- KEYNOTE: The Distributed PostgreSQL problem & how Citus solves itMarco Slot
Building Distributed PostgreSQL is perhaps one of the most challenging software engineering projects imaginable. Early on, we decided to architect...
Building Distributed PostgreSQL is perhaps one of the most challenging software engineering projects imaginable. Early on, we decided to architect Citus as a PostgreSQL extension. That way, Citus would always remain part of the PostgreSQL ecosystem even as PostgreSQL keeps developing. Moreover, architecting Citus as an extension made distribution a feature that can simply be added to PostgreSQL without losing any of its versatile feature set or its mature, efficient implementations.
The goal of Citus is to provide high PostgreSQL performance at any scale, but we learned that simply distributing data across machines is rarely sufficient to achieve that. We needed crisp distribution concepts and careful trade-offs that favor workload patterns that benefit from scaling out. Moreover, we had to tackle many complex engineering problems given the large PostgreSQL feature set, failures and concurrency in distributed systems, and mission-critical nature of databases.
In this keynote, I will discuss the main engineering challenges we faced over the past 10 years of developing the fastest, most mature, open-source Distributed PostgreSQL implementation: Citus.
Marco SlotPrincipal Software Engineer on the Citus team at MicrosoftMarco Slot is a Principal Software Engineer on the Citus team at Microsoft. He has been working on PostgreSQL extensions including Citus and pg_cron since 2014 when he joined Citus Data and has continued to lead the Citus development at Microsoft since 2019. Prior to Citus Data, Marco earned a PhD in cooperative self-driving cars at Trinity College Dublin and worked on globally distributed systems at Amazon Web Services.
- Parallelism in PostgreSQL 15Thomas Munro
An introduction to the way PostgreSQL plans and executes parallel queries. In contrast to the distributed multi-server parallelism that the Citus...
An introduction to the way PostgreSQL plans and executes parallel queries. In contrast to the distributed multi-server parallelism that the Citus database extension provides, this talk is about stock PostgreSQL using multiple CPU cores on a single machine to run a single query.
The talk will illustrate the key concepts and problems by working through simple examples of workloads that can and can't benefit from CPU parallelism. There are also many cases where parallelism could help, but doesn't yet. Some of the opportunities for future development will be discussed.
Thomas MunroPostgreSQL hacker working at MicrosoftI am a PostgreSQL developer and committer based in New Zealand. I began working full time on PostgreSQL and related technologies about 8 years ago, first at EnterpriseDB and now Citus/Microsoft. Before that I worked with Unix and relational databases in the web, finance and software industries for a couple of decades. Some of my PostgreSQL interests include query parallelism, taming resource management, transaction machinery, portability, and modernizing database/operating system interfaces. My other interests include hacking on the FreeBSD operating system, trying to learn other languages and trying to ride on various forms of transport with wheels or fins.
- What I learned benchmarking Citus & Postgres performance with HammerDBNaisila Puka
In this session, you will learn about measuring database performance of Azure Cosmos DB for PostgreSQL, the new home for Citus on Azure, through...
In this session, you will learn about measuring database performance of Azure Cosmos DB for PostgreSQL, the new home for Citus on Azure, through the HammerDB benchmark. When I first started, it wasn't obvious how to even get started with "running benchmarks". Therefore, you will get a tour of the whole process, including the following:
- Choosing the Citus database cluster we're interested in testing: its size and the tune of each node
- Configuring HammerDB based on the cluster's capacity in order to utilize the cluster's resources as much as possible
- Interpreting the benchmark result
- Tweaking parameters in steps 1 and 2 based on the CPU and Disk utilization graphs
Naisila PukaSoftware Engineer at MicrosoftSoftware Engineer working on the Citus Engine product in the Postgres team at Microsoft. Fan of organizing & decluttering (which I try to apply in my daily work in Citus as well), algebraic objects, and foreign languages.
- Postgres Storytelling: Support in the Darkest HourBoriss Mejías
This is a story experienced by Monica DeBea, a talented Postgres support engineer based in Brussels. A story some of you may have experienced...
This is a story experienced by Monica DeBea, a talented Postgres support engineer based in Brussels. A story some of you may have experienced firsthand. And if you haven’t yet, then you someday might. So grab a cup of your favorite beverage, and get ready for some Postgres storytelling.
One dark and cold night in February, Monica was the last person in the building, left alone with an almost broken application. The monitoring system she has just put in place immediately started alerting about the age of the oldest transaction. Have you ever heard about transaction ID wraparound in PostgreSQL? This is what Monica was fighting against. Why was vacuum not freezing old transaction ids? The autovacuum process was not doing it, and a manual VACUUM execution was not helping either. Monica searched and searched for the root cause. She kept searching and the clock kept ticking. The sun was gone, her teammates were gone, even the office lights were dark—but Monica was not going to allow any database downtime on her watch. This is a story of Postgres support in the darkest hour.
Boriss MejíasPostgreSQL Solution Architect at EDBI'm a holistic system software engineer, PostgreSQL solution architect at EDB, free software user, and headbanger. I got my PhD researching distributed self-managing systems and I have been working with PostgreSQL since version 9.1. In 2018 I started the PostgreSQL User Group in Belgium. I have presented in many conferences in academia, open source, and Postgres. Being a father of two fantastic daughters, I also have experience in storytelling. Now that they have grown up, I have decided to try telling stories to the Postgres community.
- How we keep Azure Database for PostgreSQL free of bloat to maximize disk spaceBob Wuisman Eleni Siampali
We were facing the issue that bloat was not removed effectively, resulting in poor database and server performance. Now we automatically update the...
We were facing the issue that bloat was not removed effectively, resulting in poor database and server performance. Now we automatically update the autovacuum settings every week on our Azure Database for PostgreSQL flexible servers.
Each week we add new client database servers to our Azure Database for PostgreSQL Flexible Server subscription. Each database is different in size and activity. Some are >300GB with daily insert, update, and delete activities; some are <10 GB with little frequent changes; and then we have everything in between. This changes per table in the databases as well.
Our automation of autovacuum has improved and stabilized Postgres query performance significantly—and saved consistently more than one TB of server disk space and growing with each new database being added.
The weekly automatic updates to the following autovacuum parameters, based on segmenting the database tables in different clusters:
- Autovacuum_vacuum_cost_limit (increase)
- Autovacuum_vacuum_cost_delay (reduce)
The 7 steps:
- Design the solution
- Collect database statistics
- Segment the database tables
- Determine the autovacuuming factors
- Automatically update the factors on each table
- Return diskspace back to the server
- Analyze results and adjust where needed
Bob WuismanEbiquity, Head of Production (Data and Technology)Bob has successfully built business intelligence environments in various businesses. With a holistic vision and process driven mindset, Bob thrives on building teams and sustainably growing data driven operations.
Technical Competences:
Database servers: PostgreSQL server, Data warehousing, data architecture, Data Governance Professional competences include people-, process-, project Management.Eleni SiampaliEbiquity, Senior Data EngineerEleni is a Data Engineer who has years of experience in automating data processes and running data science projects. She thrives in solving complex projects and trying out new applications by combining and integrating different technologies. Within Ebiquity she has automated many processes and enabled automated query tasks across multiple Postgres databases. Technical competences are: PostgreSQL, Python, Databricks, Kubernetes, Docker and Azure Pipelines.
- Citus & Patroni: The Key to Scalable and Fault-Tolerant PostgreSQLAlexander Kukushkin
Citus is an open source extension to PostgreSQL that enables you to scale out your database horizontally by sharding your data across many nodes,...
Citus is an open source extension to PostgreSQL that enables you to scale out your database horizontally by sharding your data across many nodes, and Patroni is an open source tool for managing and automating PostgreSQL High Availability. Combined together three open source projects become a superhero – a scalable PostgreSQL cluster with self-healing capabilities.
In my presentation I will cover implementation details of Patroni & Citus integration, and do a live-demo of cluster deployment and showcase maintenance on Citus worker nodes without interrupting client connections.
Alexander KukushkinPrincipal Software Engineer at MicrosoftAlexander is better known in PostgreSQL community as "the Patroni guy". Patroni is an open source tool for implementing PostgreSQL clustering and high availability. Besides Patroni Alexander occasionally contributes to PostgreSQL and other open source projects and tools, usually Postgres related.
- Post-EMEA Livestream Wrap UpClaire Giordano Jelte Fennema
Wrap-up to the EMEA Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Jelte Fennema. Includes...
Wrap-up to the EMEA Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Jelte Fennema. Includes impressions of the overall event from the virtual hallway track on Discord, to the swag, and links to the other livestream in Americas—plus highlight video reels of the 25 on-demand talks, too.
Claire GiordanoCitus & Postgres Open Source Champion @ MicrosoftJelte FennemaSenior Software Engineer @ Microsoft
-
-
On-demand Sessions- Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, & Azure ServerlessLucas Borges Fernandes
Azure Cosmos DB for PostgreSQL is a managed service offering that is powered by the open-source Citus database extension to Postgres. It has many...
Azure Cosmos DB for PostgreSQL is a managed service offering that is powered by the open-source Citus database extension to Postgres. It has many features to help run enterprise-ready applications. One of the top Citus features is the ability to run PostgreSQL at any scale, on a single node as well as a distributed database cluster.
As your application needs to scale, you can add more nodes to the Azure Cosmos DB for PostgreSQL cluster, rebalance existing data to the new Postgres nodes, and seamlessly scale out. However, these operations require manual intervention: a) first you must create alerts on metrics, b) then, based on those alerts, you need to add more nodes, c) then you must kick off and monitor the shard rebalancer. Automating these steps will give you a complete auto scale experience—and make your life so much easier.
Let us learn how to “auto scale” by using existing tools such as Grafana & Azure Serverless.
- Configuring Grafana to capture metrics from Azure Cosmos DB for PostgreSQL
- Configure alerts based on those metrics
- Trigger Azure Serverless function using Grafana alerts to automate the process of scaling out, both automatic node addition and shard rebalancer maintenance
- As icing on the cake, we will also demonstrate automatic tenant isolation!
Lucas Borges FernandesSoftware Engineer, MicrosoftI'm a Brazilian software engineer with years of experience in the field of application development. I also have a MSc in the field of distributed systems. I'm currently working at Microsoft as a Customer Engineer for the CosmosDB for PostgreSQL team.
- Azure AD authentication with PostgreSQL Flexible ServersAndrey Chudnovskiy
Azure Active Directory (Azure AD) is a cloud-based identity and access management service, built on top of OAUTH2 and other widely used protocols....
Azure Active Directory (Azure AD) is a cloud-based identity and access management service, built on top of OAUTH2 and other widely used protocols. The Azure AD authentication feature in the Flexible Server option in Azure Database for PostgreSQL allows users and applications to authenticate against PostgreSQL flexible servers using Azure AD Access tokens.
In this talk you will learn about the security benefits of Azure AD authentication compared to password-based methods.
For the database administrators and application developers we will cover the process of managing PostgreSQL roles for Azure AD Users, Groups, and Applications. Followed by the process of obtaining the tokens and connecting to PostgreSQL in different environments.
This talk will also give you a solid understanding of how Azure AD authentication works in the Flexible Server option in Azure Database for PostgreSQL—plus how to use Azure AD to provide a secure and convenient way to authenticate users/applications connecting to PostgreSQL databases.
Andrey ChudnovskiySoftware Engineer - Azure Database for PostgreSQLAndrey is a Principal Software Engineering Manager @Microsoft leading Azure AD integration for PostgreSQL managed services. Andrey joined the PostgreSQL database world 3 years ago, after a decade of experience building applications and systems at both Microsoft and Google. Prior to leading the Azure AD authentication engineering team, Andrey worked on operational readiness for Flexible Server, one of our managed services for Postgres on Azure.
Beyond Software Engineering, Andrey is a Home cooking and Travel enthusiast, and a doting father of 9 and 11 year-olds.
- Citus from the Customer eyesSai Srirampur
As a Solutions Engineer for the Citus database extension for the past ~7.5 years, I have closely worked with many customers and onboarded them to...
As a Solutions Engineer for the Citus database extension for the past ~7.5 years, I have closely worked with many customers and onboarded them to run their applications on Citus and PostgreSQL. This talk will synthesize all those exciting customer experiences to present an end-to-end journey of what a typical customer onboarding looks like for both Citus open source and for Citus on Azure (now known as Azure Cosmos DB for PostgreSQL.)
This talk will cover steps starting from "product discovery"—where you get to know the value of Citus, to "proof-of-concept"—where you validate that value, to "migration"—where you implement the workload and go into production with Citus.
The following topics would be covered as a part of the talk:
- Why Citus to distribute PostgreSQL?
- Fit use cases and the misfit ones
- Data-modeling (includes distributed data-modeling)
- Performance tuning
- Migration process
- Value of the managed service offering and so on
The talk would be a perfect fit for any customer who is planning to evaluate or is already evaluating Citus for their real-world use case.
Sai SrirampurPrincipal Engineering Manager, Microsoft, Citus Customer Engineering LeadHead of the Solutions Engineering team for the Citus database extension at Microsoft. Works with multi-tenant SaaS, real-time analytics, and time series customers to recommend the right data models, run POCs, assist with migrations, & more. Speaker at PyCon Canada, PyBay, PyCon, SF Postgres Meetup, & Citus Con: An Event for Postgres. Ping pong pro.
- Citus on KubernetesÁlvaro Hernández
If you want to scale Postgres with Citus, on Kubernetes, this is your talk.
This talk will start with the easy part:
- Analyze the current...
If you want to scale Postgres with Citus, on Kubernetes, this is your talk.
This talk will start with the easy part:
- Analyze the current landscape of solutions that support Citus on Kubernetes.
- How to load the extension and create a Citus sharded cluster.
But then move on to the really interesting part:
- How Patroni 3.0 significantly helps with Citus integration.
- How to automate Citus sharded cluster creation to the extreme, thanks to Kubernetes’ extensible API.
- Demo it all live.
The demo and all the concepts shown will be performed on StackGres, a 100% open source project for running Postgres on Kubernetes that supports Citus.
Álvaro HernándezOnGres, FounderÁlvaro is a passionate database and software developer. Founder of OnGres, he has been dedicated to PostgreSQL and R&D in databases since two decades ago.
Álvaro is at heart an open source advocate and developer. Well-known member of the PostgreSQL Community, he has founded the non-profit Fundación PostgreSQL and the Spanish PostgreSQL User Group. He has contributed, among others, the SCRAM authentication library to the Postgres JDBC driver.
You can find him frequently speaking at PostgreSQL, database, cloud, and Java conferences. In the last 12 years, Álvaro has completed more than 110 tech talks (https://aht.es).
- Demystifying Postgres Foreign Key Constraints on CitusOnder Kalaci
Foreign keys are one of the fundamental building blocks of any relational database, including PostgreSQL and Citus.
For a distributed database,...
Foreign keys are one of the fundamental building blocks of any relational database, including PostgreSQL and Citus.
For a distributed database, such as Citus database engine, there are some additional considerations when using these powerful constraints. Certain concepts like reference tables or distribution keys should be well thought along with foreign keys.
In this talk, you are going to learn all about the foreign keys on PostgreSQL as well as their usage on Citus.
Onder KalaciCitus database engine development engineerPrincipal software engineer on the Postgres team at Microsoft. Joined Citus Data to scale out Postgres.
- Divide and Conquer: Multi-tenancy in PostgresKaren Jex
Most organizations or database teams don't have the luxury of looking after just a single database application. Some will manage at least a few...
Most organizations or database teams don't have the luxury of looking after just a single database application. Some will manage at least a few applications, others will have hundreds. In most organizations, the question of multi-tenancy will therefore probably arise.
The Postgres environment can be designed in various different ways to accommodate multiple customers and/or applications, but which is the “right” option?
The answer is, as always, “it depends”. The correct method is the one that provides the required balance between performance, security, high availability and manageability.
In this presentation, we'll look at the different multi-tenant architectures and the advantages and disadvantages of each. It should enable you to design the most appropriate database environment for your applications.
Karen JexSenior Solutions Architect at Crunchy DataKaren was an Oracle DBA for 20 years before starting to work with PostgreSQL databases. She liked them so much that she became a Senior Database Consultant and then a Senior Solutions Architect working exclusively with PostgreSQL. She was once described as “quite personable for a DBA” which she decided to take as a compliment. Outside of the world of databases she loves cycling, mountain biking, skiing and spending time with her family in the mountains where she lives. She is also doing a part-time PhD in Computer Science.
- Do more with less: How to reduce your PostgreSQL costs on AzureVarun Dhawan
Azure Database for PostgreSQL - Flexible Server is a fully-managed relational database based on PostgreSQL open source on Azure.
One of the...
Azure Database for PostgreSQL - Flexible Server is a fully-managed relational database based on PostgreSQL open source on Azure.
One of the greatest benefits of running your application on PostgreSQL in the cloud is being able to scale up and down to meet demand and reduce operational expenditures. And that is especially true when you are experiencing unexpected changes in workload. With Flexible Server in the PostgreSQL managed service on Azure, you can take control of your costs and continuously optimize your cloud spend, while building modern, scalable applications to meet your customers’ needs.
If this sounds exciting, come watch this session where you will learn 10 things you can do today to reduce the cost of Azure Database for PostgreSQL. This talk is for anyone who wants to reduce costs in their PostgreSQL spend in the cloud, whether you are an early-stage startup or a large enterprise with a global footprint.
Varun DhawanProduct @ Microsoft Azure Data ☁️Varun Dhawan is a Senior Product Manager @Microsoft where he works in Azure Open Source Database team focusing on PostgreSQL and Azure Monitor. Prior to Microsoft, Varun had worked at Target Corp, and McKinsey & Company on Observability, Database as a Service and Cloud. ❤️ blogging- http://data-nerd.blog
- For Your Eyes Only: Roles, Privileges, and Security in PostgreSQLRyan Booz
Security is an essential, yet often misunderstood, part of effectively managing a PostgreSQL cluster. As the popularity and adoption of PostgreSQL...
Security is an essential, yet often misunderstood, part of effectively managing a PostgreSQL cluster. As the popularity and adoption of PostgreSQL continues to grow, the interaction of roles, privileges, and object ownership is a recurring theme of confusion in forums and mailing lists.
In this session, I’ll start by defining the Principle of Least Privileges and how this philosophy influences roles and privileges in PostgreSQL. Next, I’ll demonstrate the importance of object ownership in PostgreSQL, how group and user roles can work in unison to effectively manage permissions, multiple ways to manage default privileges, and which privileges should always be modified in any new PostgreSQL database. Finally, I’ll discuss how recent releases are laying a foundation for more flexible and robust security management in the years ahead.
By the end of this session, you will understand how roles work in PostgreSQL, how they impact your daily work, and how to effectively communicate security best practices with others on your team. You’ll leave with solid information to start creating roles that effectively manage access to your cluster and data.
Ryan BoozDeveloper Advocate at RedgateRyan is an Advocate at Redgate focusing on PostgreSQL. Ryan has been working as a PostgreSQL advocate, developer, DBA and product manager for more than 20 years, primarily working with time-series data on PostgreSQL and the Microsoft Data Platform.
Ryan is a long-time DBA, starting with MySQL and Postgres in the late 90s. He spent more than 15 years working with SQL Server before returning to PostgreSQL full-time in 2018. He’s at the top of his game when he's learning something new about the data platform or teaching others about the technology he loves.
- How to copy a Postgres database?Dimitri Fontaine
How to copy a Postgres database from a server to another server is a frequently asked question with surprisingly few answers, most of them being...
How to copy a Postgres database from a server to another server is a frequently asked question with surprisingly few answers, most of them being quite poor. Which is why I started working on pgcopydb, a new Open Source tool that's available precisely to answer that question.
In this talk we will see what makes the topic more interesting than it first seems, and how to fully answer the question, including Capture Data Changes implementation.
Dimitri FontainePostgreSQL Major ContributorDimitri Fontaine is a PostgreSQL Major Contributor (design, review, Extensions, Event Triggers, etc). He maintains and develops pg_auto_failover, pgcopydb, pgloader and other software.
Dimitri has also written “The Art of PostgreSQL”, a book that teaches application developers to replace thousands of lines of code with simple SQL queries.
- Implementing Temporal Features in PostgreSQL: SQL Standard and BeyondHenrietta Dombrovskaya
Since 2011 the SQL standard has defined a set of features providing essential support for temporal data. These features include periods, row...
Since 2011 the SQL standard has defined a set of features providing essential support for temporal data. These features include periods, row versioning, and syntax extensions for SQL commands.
Some of these features already exist in recent releases of PostgreSQL. In addition, PostgreSQL provides several tools, such as GIST indexes, that are not explicitly related to temporal data but can make the implementation of temporal data efficient.
In this talk, we briefly describe temporal data concepts. We then discuss relationships between SQL standards and the implementation of bi-temporal data in PostgreSQL. We further discuss how advanced temporal features, such as multiple time dimensions, are provided on top of standard basic features.
Henrietta DombrovskayaDatabase Architect at DRW, Local organizer of Chicago PostgreSQL User GroupHenrietta Dombrovskaya is a database researcher and developer with 40 years of academic and industrial experience. She holds a Ph.D. in Computer Science from the University of Saint Petersburg, Russia. At present, she is
- Database Architect at DRW
- Local Organizer of the Chicago PostgreSQL User Group
- Active community member, a frequent speaker at the PostgreSQL Conferences
- An author of the PostgreSQL Query Optimization book
- In depth guide to Postgres’ new CINazir Bilal Yavuz
A common challenge while developing on Postgres is making sure your code works on all the different platforms Postgres supports. This often leads...
A common challenge while developing on Postgres is making sure your code works on all the different platforms Postgres supports. This often leads to a complex testing process for your code if you have to do it manually. To make it easier, Postgres recently added CI support. Once this is enabled, it automatically runs the regression tests suites on your code on different platforms and shares the results. Doesn’t it sound like something you should be using?
In this talk, I’m going to talk about how CI works, how to enable it for your repository and share some tips on how to best use it to debug Postgres issues. I will also demonstrate how CI can make developing Postgres code easier with a real-time debugging session.
Nazir Bilal YavuzSoftware Engineer @Microsoft – Open-Source PG TeamNazir Bilal Yavuz is a software engineer on the Open-Source PG Team at Microsoft. Bilal earned his B.S degree in Computer Engineering at METU. He also worked as a DevOps Engineer at English Central before joining Microsoft two years ago. He enjoys sports, playing video games, making puzzles and LEGO.
- Lessons learned: Migrating from AWS-Hosted PostgreSQL RDS to Self-Hosted CitusMatt Klein Delaney Mackenzie
Jellyfish is a multi-tenant SaaS application based on PostgreSQL (AWS-hosted RDS). In 2021, we realized we needed a plan for scaling out our DB....
Jellyfish is a multi-tenant SaaS application based on PostgreSQL (AWS-hosted RDS). In 2021, we realized we needed a plan for scaling out our DB. The Citus database extension seemed like a great fit. We appreciated that Citus was an open source extension to PostgreSQL and that we could continue to leverage our investment in PostgreSQL without having to write our own sharding layer.
Implementing Citus to distribute PostgreSQL necessitated leaving behind the managed DB world of RDS and instead self-hosting PostgreSQL on our own machines. We believed that the benefits would be worth the cost of self-hosting, and we set down that path in 2022.
In this talk, we'll share our experience implementing Citus. In many ways we were blown away by the depth and maturity and elegance of what Citus provides as a way of scaling PostgreSQL. At the same time, we ran into significant challenges with adapting our Python/Django codebase and standing up a robust and reliable self-hosted Citus infrastructure.
When you build with PostgreSQL and Citus you get to stand on the shoulders of the giants who have built these open source products for the last 30+ years. We want to share our journey so others can learn from our successes and our missteps—and stand on a higher perch than we did when we set out.
Matt KleinPrincipal Engineer & PostgreSQL expert at JellyfishPrincipal engineer on the platform team. Joined Jellyfish as a founding engineer 5 years ago, pre-series A. Have been a software engineer for almost (gulp) 30 years. Have worked at early stage startups since 2008. Have worked with PostgreSQL since 2010: application and administration (self-hosted and AWS RDS).
Delaney MackenzieEngineering Team Lead at JellyfishDelaney leads a software team at Jellyfish focused on DAGs and data quality. Prior to Jellyfish, he worked on the technical team at a boutique consulting company in Boston, and prior to that he spent almost 5 years at Quantopian (a democratized quant trading startup). While at Quantopian, he worked in many roles, including backend engineer on their event driven backtest system, building out a new educational business line, and running user data science under the auspices of user growth.
He has a degree in computer science from Princeton and spent a while doing computational genetics research before joining the startup world.
- Maps with Django (and PostGIS)Paolo Melchiorre
A map in a website is the best way to make geographic data easily accessible to users because it represents, in a simple way, the information...
A map in a website is the best way to make geographic data easily accessible to users because it represents, in a simple way, the information relating to a specific geographical area and is in fact used by many online services.
Implementing a web map can be complex and many adopt the strategy of using external services, but in most cases this strategy turns out to be a major data and cost management problem.
In this talk we'll see how to create a web map with the Python based web framework Django using its GeoDjango module, storing geographic data in your local database on which to run geospatial queries.
Through this intervention you can learn how to add a map on your website, starting from a simple map based on Spatialite/SQLite up to a more complex and interactive map based on PostGIS/PostgreSQL.
Paolo MelchiorreCTO @ 20tab - Django Software Foundation member - www.paulox.netI’m Paolo Melchiorre, a longtime Python backend developer who contributes to the Django project and gives talks at tech conferences.
I’ve been a GNU/Linux user for over 20 years and I use and promote Free Software.
I graduated in Software Engineering and I’m an alumnus of the University of Bologna, Italy.
I’ve been working in the web for 15 years and now I’m the CTO of 20tab, a pythonic software company, for which I work remotely.
- Multi-tenant SaaS apps made simple on Azure Cosmos DB for PostgreSQLAlicja Kucharczyk Adam Wolk
Cloud computing has given rise to a massive new market of SaaS applications built by ISV's and enterprises, however architecting these multi-tenant...
Cloud computing has given rise to a massive new market of SaaS applications built by ISV's and enterprises, however architecting these multi-tenant SaaS apps to optimize for underlying infrastructure cost—whilst still providing guarantees of isolation and performance—is hard.
In this session, Adam and Alicja will showcase a number of unique features in Azure Cosmos DB for PostgreSQL, a managed service powered by the open source Citus database extension. These Azure Cosmos DB for PostgreSQL features enable developers to build and scale multi-tenant SaaS apps easily with a variety of tenant management features built natively into the database.
Alicja KucharczykMicrosoft, Program Manager - Azure Database for PostgreSQLAlicja, currently working at Microsoft as Program Manager in Azure Database for PostgreSQL team, is a PostgreSQL expert. She is both an experienced developer as well as administrator and PostgreSQL coach with strong practical knowledge of Linux and their mutual interactions. Alicja is particularly interested in performance optimization at different levels. She has consulted a lot of companies, mainly in Poland providing them with working solutions and supporting them in architecting, deploying and maintaining PostgreSQL.
Adam WolkSenior Program ManagerOpenBSD developer interested in database engineering, distributed systems, information security & networking. Over a decade of experience with large and complex systems ranging from the corporate financial sector to the start-up landscape. Experienced team leader developed systems from the ground up from C on ARM devices, through back-end systems in Go, Python and Rails to web front ends in modern JavaScript frameworks—all continuously integrated, deployed and monitored.
Currently pushing at the edge of distributed SQL as a Senior Program Manager for the Citus Database extension, a core component of the Azure Cosmos DB for PostgreSQL managed service.
- Optimizing Postgres for write heavy workloads ft. Checkpoint and WAL configsSamay Sharma
As you ramp up the write load on your Postgres database, you will often benefit from tuning checkpoint and WAL related configs for better...
As you ramp up the write load on your Postgres database, you will often benefit from tuning checkpoint and WAL related configs for better performance. In the last decade, I’ve personally recommended adjusting parameters like max_wal_size, checkpoint_timeout and checkpoint_completion_target to dozens of customers. However, what they’ve found more useful is understanding the concepts so that they can reason on their own on how to tune these for their workload and also understand how other parameters (eg. full_page_writes) influence checkpoints and WAL generation.
In this talk, you’ll learn what checkpoints are, when they are triggered and how they impact database performance. We’ll also go over the most important configs which impact checkpoints and WAL generation, recommend how to set them and how to monitor and adapt them as your workload changes.
Samay SharmaEngineering Manager - PostgreSQL team at MicrosoftSamay is a principal engineering manager in the PostgreSQL team at Microsoft. He has been working with PostgreSQL for almost a decade (at Microsoft and at Citus Data prior to that) as an extension developer, solutions engineer, and an ardent fan of PostgreSQL. Over the last few years, he has been working directly with PostgreSQL customers to improve and optimize their databases. He has a keen interest in making it easier for users to understand PostgreSQL performance.
- Partitioning strategy for Oracle to PostgreSQL migrations on AzureAdithya Kumaranchath
Oracle to PostgreSQL is one of the most common database migrations in recent times. For numerous reasons, we have seen several companies migrate...
Oracle to PostgreSQL is one of the most common database migrations in recent times. For numerous reasons, we have seen several companies migrate their Oracle workloads to PostgreSQL, both in VMs or to Azure Database for PostgreSQL. Table partitioning is a critical concept to achieve response times and SLAs with PostgreSQL. While a few open-source and third-party tools migrate the table schema and packages, there are not out-of-the-box tools that migrate partitions. So, partitioning strategy has become a very important topic for migration. As part of the migration, it is very important to have a good partitioning strategy due to the difference between Oracle and PostgreSQL. A bad partitioning strategy can lead to bloating and the vacuuming can take time.
As part of this session, you will get familiar with Oracle and PostgreSQL partitioning. We will discuss tools and extensions that you can use to migrate partitions from Oracle to PostgreSQL. We will also discuss partitioning strategies compared to Oracle, and how you can choose alternatives.
Adithya KumaranchathMicrosoft, Engineering Architect @Azure DataAdi joined Microsoft 10 years ago. He is currently an Engineering Architect in Azure Data. He has been working with customers from various industries across US, UK, Asia, and Europe, to solve complex data migration and engineering problems. His passion for programming has kept him close to the leading-edge tech and has contributed to several ideas, architecture patterns, and methodologies that are used by many teams within and outside of Microsoft. He has presented on numerous occasions at many conferences on technology, data analytics, and machine learning. During his career, he led and worked on many large-scale, unique, and complex data modernization and migration programs across industries including aerospace, aviation, pharmaceutical, energy, manufacturing, and retail.
- Postgres and the Artificial Intelligence LandscapeBruce Momjian
Artificial intelligence, machine learning, and deep learning are intertwined capabilities that attempt to solve problems that defy traditional...
Artificial intelligence, machine learning, and deep learning are intertwined capabilities that attempt to solve problems that defy traditional computational solutions — problems include fraud detection, voice recognition, and search result recommendations. While they defy simple computation, they are computationally expensive, involving computation of perhaps millions of probabilities and weights. While these computations can be done outside of the database, there are specific advantages of doing machine learning inside the database, close to where the data is stored. This presentation explains how to do machine learning inside the Postgres database.
Bruce MomjianEDBBruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. He has been employed by EDB since 2006. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and Concepts, published by Addison-Wesley. Prior to his involvement with PostgreSQL, Bruce worked as a consultant, developing custom database applications for some of the world's largest law firms. As an academic, Bruce holds a Masters in Education, an honorary doctorate, was a high school computer science teacher, and lectures internationally.
- PostgreSQL performance tips you have never seen beforeHans-Jürgen Schönig
In PostgreSQL a lot can be done to speed up queries. However, many of the tricks are not very well known and need some more attention. This...
In PostgreSQL a lot can be done to speed up queries. However, many of the tricks are not very well known and need some more attention. This presentation will show some trickery which is possible but most people are not aware of. Learn how to unleash a lot of hidden performance potential.
Hans-Jürgen SchönigCEO of CYBERTECHans-Jürgen Schönig has been a PostgreSQL professional for 20 years and helps customers around the globe to work with PostgreSQL. He has written a dozen books on PostgreSQL.
- PostgreSQL Security: Defending Against External AttacksTaras Kloba
In this talk, we will explore the various ways that external attackers can target PostgreSQL deployments and how to defend against these threats....
In this talk, we will explore the various ways that external attackers can target PostgreSQL deployments and how to defend against these threats. We will cover topics such as authentication security, protecting queries and data from network snooping, and preventing session interception. We will also discuss best practices for securing backups and preventing server theft. This talk will provide practical tips and techniques for ensuring the security of your PostgreSQL deployment and protecting against external attacks. Attendees will come away with a deeper understanding of the vulnerabilities that can affect PostgreSQL and how to mitigate them.
Taras KlobaBig Data Engineering Manager at SoftServeMy name is Taras. I have more than 12 years of technical experience in the IT sector with such domain areas as banking, online advertising, and gambling. I always help my clients make effective business decisions based on ever-growing data. Furthermore, I am a founder of the PostgreSQL Ukraine community and also drive the Big Data Community Ukraine by organizing meetings with world-known experts. I was nominated Best Software Architect of Ukraine in 2019 by Ukrainian IT Awards and Microsoft Certified Trainer 2022-2023.
- Practical approach to building real-time analytics for cybersecurity applicationsSlava Moudry
Cybersecurity companies use many metrics and need to filter a lot of data to find security threats. This talk describes a practical approach to...
Cybersecurity companies use many metrics and need to filter a lot of data to find security threats. This talk describes a practical approach to design a database schema and ingestion workflow that employs the Citus database extension to Postgres to power Analytics for cybersecurity data. Similar approach can be used in other time-series applications.
The aspects you will learn about are:
- Advantages of Citus and Postgres over other streaming technologies
- Choosing best data structures for table columns
- Choice of distribution key to ensure colocation and even distribution in a Citus cluster
- Best practices with native Postgres partitioning (e.g. pg_partman)
- Schema design to support flexible API request queries
- Optimizing data ingestion and data aggregation using Manual Query Propagation
- Keeping things running smoothly. Watching for I/O spikes. Optimizing autovacuum settings
Slava MoudryDatabase developer focusing on real-time analyticsSlava was born in Kiev, Ukraine. Since 2003 he's been building data warehouse applications using Oracle and Teradata. Since 2005 Slava has used Postgres to build analytics and data warehouses. He worked as Postgres DBA, Database engineer in several companies in the Bay Area. In 2013 Slava used Postgres and Citus to build Analytics at Cloudflare. He continued working with Citus on real-time analytics at IronNet.
- Self Hosted Citus using Cloud ComputingPaul De Audney
A break down of SafetyCulture’s journey moving workload onto a self hosted Citus cluster for their SaaS business.
Solution R&D
- Is Citus the...
A break down of SafetyCulture’s journey moving workload onto a self hosted Citus cluster for their SaaS business.
Solution R&D
- Is Citus the right tool for our problem?
- Can’t we just use PostgreSQL native partitioning with FDW/remote partitions?
Proof of Concept
- Does it actually work like it says it will work (for our workload)?
Building the business case
- What problem are we solving and why is it important to the business?
- What is the impact on other teams?
Marshalling a team
- Who are the people we need to make this work and tie it all together into a production service?
Problem Spaces
- Infrastructure deployment & management.
- High availability tooling & systems to ensure reliable operation at scale.
- Development lifecycle / what does this look like in our dev/staging/production environments.
Gotchas
- Segfaults after upgrading from 10.X to 11.X
Testing & Validation
- Load testing
- Failover testing
- Replication modes + load testing (async vs sync modes)
- Bootstraping a standby from node failures
- Upgrade deployment strategies
Going Live
- This can be another whole talk.. but some quick tl’dr of our how we did dual writes. And in some cases dual service stack to validate the data on Citus against the native PostgreSQL solution.
Paul De AudneySafetyCulture, Senior Staff EngineerPaul has been working with computers professionally since 1999.
He's been working in the infrastructure/sysadmin space predominantly with a few years here and there as a manager mostly for startups and scale up companies.
He's been using PostgreSQL on and off since before there was an auto vacuuming capability.
- Troubleshooting High CPU Utilization in PostgreSQL DatabasesDmitry Romanoff
PostgreSQL is one of the most popular databases in the World.
Central processing unit (CPU) usage is one of the critical metrics to check for a...
PostgreSQL is one of the most popular databases in the World.
Central processing unit (CPU) usage is one of the critical metrics to check for a PostgreSQL DB (database) instance. Checking CPU utilization allows for understanding if the DB is experiencing performance issues like inefficient SQL queries, lack of indexes, and contention.
After seeing a high CPU of the PostgreSQL DB instance, it’s essential to find the root cause of the issue. Queries can be poorly written, too frequent, or heavy.
During the session, I will demonstrate useful SQL queries that can be run on the PostgreSQL DB instance to investigate the high CPU utilization. It will also help you understand what’s running in the database instance that impacts CPU usage.
Dmitry RomanoffDevOps Data Tech Lead at JFrogDmitry is an experienced, dedicated, and highly motivated DevOps Data Lead, Cloud Architect, and DBA. He has extensive experience in the design, development, implementation, and management of large, complex, high-performance, and mission-critical system/software/cloud-based solutions. Dmitry has been giving lectures at conferences and publishing articles and blogs.
- Twisting PostgreSQL into a Document Db and Event StoreJeremy Miller
PostgreSQL has some unusually powerful functionality for storing, retrieving, and querying on JSON data. How powerful you ask? Enough that the...
PostgreSQL has some unusually powerful functionality for storing, retrieving, and querying on JSON data. How powerful you ask? Enough that the Marten (https://martendb.io) project has successfully utilized the PostgreSQL JSONB type to build out a fully fledged document database and event store feature set. In this talk I'll show you what we've done, the bits of PostgreSQL that's made it possible, and explore how we're working with PostgreSQL to allow our users to scale to much larger data sets.
Jeremy MillerSenior Director of Software Architecture, MedeAnalyticsJeremy Miller is the Senior Director of Software Architecture at MedeAnalytics. Jeremy began his software career writing "Shadow IT" applications to automate his tedious engineering documentation, then wandered into software development because it looked like more fun. Jeremy is heavily involved in open source .NET development as the lead developer of Marten, Lamar, Alba, and other projects in the JasperFx family. Jeremy occasionally manages to write about various software topics at http://jeremydmiller.com.
- Unconventional ways to index UUIDs in PostgreSQLHaki Benita
You probably have at least one or two UUIDs in your schema with a good ol' B-Tree index. But, have you ever considered that PostgreSQL may have...
You probably have at least one or two UUIDs in your schema with a good ol' B-Tree index. But, have you ever considered that PostgreSQL may have other options for you?
In this talk I present unconventional ways to index UUIDs in PostgreSQL.
Haki BenitaDeveloper & technical lead specializing in databasesA tech lead specializing in databases, web development and performance tuning. Check out hakibenita.com
- Understanding & Managing Postgres Table BloatChelsea Dole
When making the jump from "running Postgres", to "running Postgres at scale", managing table bloat is a hurdle which many engineers trip over. By...
When making the jump from "running Postgres", to "running Postgres at scale", managing table bloat is a hurdle which many engineers trip over. By diving into topics such as Multi-Version Concurrency Control (MVCC), autovacuum, and table design, this talk will explain what table bloat is, why it occurs, how to mitigate it, and (best of all) how to avoid tripping over it in the first place.
Chelsea DoleSenior Software Engineer, BrexChelsea Dole is a Senior Software Engineer at Brex, a fintech startup providing B2B credit access and expense management software.
As an engineer on the Data Storage team, she builds Postgres infrastructure at scale, performs query optimization, and seeks to up-level organizational understanding of databases.
Chelsea is active within the Postgres and Python communities, and has presented talks about cloud computing, data storage, and Python.
-
Subscribe to notifications to keep up with Citus Con news
Join the conversation
The Postgres and Citus team at Microsoft is proud to be the host of Citus Con: An Event for Postgres.
©2024 Citus Data, a Microsoft Company. All rights reserved.
Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.