Schedule with links to videos
Citus Con: An Event for Postgres 2022
Citus Con: An Event for Postgres 2022
No registration required. Each livestream has a different keynote with a total of ~6 unique sessions, so you may want to join multiple livestreams. And the ~20 on-demand sessions are pretty inspiring, too. All talks are in English and will have captioning.
Join our Discord channel to join the conversation.
| Americas livestream: | Apr 12 | 9a–12:30p PT |
|---|---|
| APAC livestream: | Apr 13 | 11a–2:30p SGT |
| EMEA livestream: | Apr 13 | 11a–2:30p CEST |
| On-demand sessions: |
Over the last 50 years, people have built advisory tools to assist in all aspects of database tuning and optimization. Most of this work is...
Over the last 50 years, people have built advisory tools to assist in all aspects of database tuning and optimization. Most of this work is incomplete because it requires humans to make decisions about changes to the database—plus the tools are reactive, fixing problems only after they occur. A “self-driving” database aims to overcome these limitations by anticipating future workload trends, automatically deploying optimizations without human intervention, and learning from its environment to handle new situations. And a self-driving database can be optimized in ways that are not possible today, because the complexity of databases has surpassed the abilities of human experts.
This talk will cover state-of-the-art in autonomous DBMSs—namely what is possible today and what problems remain—as well as our efforts at Carnegie Mellon to build a true self-driving DBMS in the NoisePage and OtterTune projects. We’ll then explore how PostgreSQL already has most of the instrumentation and extension hooks needed to control its runtime behavior via ML-based planning components. We’ll finish with our “wish-list” of features for PostgreSQL—the building blocks—that can help to bring self-driving PostgreSQL closer to reality.
Andy Pavlo is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. His (unnatural) infatuation with database systems has inadvertently caused him to incur several distinctions, such as VLDB Early Career Award (2021), NSF CAREER (2019), Sloan Fellowship (2018), and the ACM SIGMOD Jim Gray Best Dissertation Award (2014). He is also the CEO & co-founder of the OtterTune database tuning start-up (2020).
Using a few real-world examples, this talk will walk you through the basics of using tools like perf and eBPF for monitoring and ad-hoc analysis in...
Using a few real-world examples, this talk will walk you through the basics of using tools like perf and eBPF for monitoring and ad-hoc analysis in production Postgres environments.
Common Postgres and OS-level monitoring tools give you useful information to analyze and prevent some kinds of performance problems, but often lack sufficient detail. The Perf and eBPF tracing tools allow to gather a lot of the missing details—in some cases cheaply enough to make continuous monitoring feasible.
Ever wondered what the cache hit ratio including both Postgres' shared_buffers and the OS page cache is? Whether queries are slow due to data transferred over the network? How much CPU usage is caused by SSL or a Postgres extension? Learn how to answer these and many other questions!
Andres Freund is a long-time Postgres Developer. He has worked on many aspects of Postgres, including logical decoding, scalability, pluggable table storage, query execution performance, bugfixing and patch review. He works for Microsoft, doing open source PostgreSQL development.
“Distribution Column” (OR shard key) lies at the heart of how Citus transforms PostgreSQL to a distributed database. Shard key determines how data...
“Distribution Column” (OR shard key) lies at the heart of how Citus transforms PostgreSQL to a distributed database. Shard key determines how data is distributed in the cluster and how efficiently your Postgres queries are executed by Citus.
In this talk you will learn various criteria required to choose an optimal shard key. Some criteria include workload type, table sizes, cardinality of column(s), commonality of column(s) across tables etc. We will walk through each of them using real-world examples and present an organized approach that you can follow to pick the right distribution column for your workload.
Finally, the icing on the cake, we will explore the possibilities for a tool which heuristically assesses above criteria to automatically predict optimal shard key(s).
I lead the Customer Engineering team of Citus, within Microsoft. This team is responsible for making technical onboarding easy for customers. This includes, providing direct expert guidance to customers implementing to citus and building tools to make the migration journey seamless.
Learn more about developing and refining an algorithm for prefetching data blocks from the application layer.
Data prefetching (often called...
Learn more about developing and refining an algorithm for prefetching data blocks from the application layer.
Data prefetching (often called "readahead") is a necessary component of PostgreSQL's proposed new IO paradigm: direct and asynchronous IO. Though as of PostgreSQL 14, prefetching is already in use in limited contexts (e.g. bitmap heap scan), without kernel readahead, Postgres must implement additional prefetching of its own.
This talk will cover the basics of IO prefetching, the prefetching algorithm proposed with the direct and asynchronous IO patch set, and the process of refining a prefetching model.
Melanie is a Postgres hacker working at Microsoft. She has worked on the Postgres executor, planner, storage, and statistics subsystems. Most recently she has been hacking on the proposed asynchronous and direct IO patch set. She is passionate about writing maintainable code and about building developer tools.
With our geospatial application that runs on PostGIS, Citus, and Postgres, some of the biggest global insurance firms translate vast amounts of...
With our geospatial application that runs on PostGIS, Citus, and Postgres, some of the biggest global insurance firms translate vast amounts of data into spatial insights. The workload has 3000 global users, with 10-50 of them concurrently performing complex geospatial analysis, crunching millions of rows across 1000s of spatial polygons. Results are expected in real-time. The dataset is 25TB.
In this talk, you’ll learn about our challenges scaling our spatial capabilities as our data volume & number of users (including concurrency) grew. We’ll share our learnings from building a large-scale geospatial workload on Postgres. Our goal was to identify a database that scales, which made us evaluate and land on the PostGIS & Citus extensions to Postgres.
Come learn how PostGIS and Citus meet our application’s requirements: how the spatial indexing of PostGIS combines with the ability to scale out (shard) data across multiple nodes with Citus, without affecting performance.
I work as a Senior Engineering Manager with19 years. of software Development experience. I am passionate about architecting high performing scalable software applications to solve complex business problems. Having worked on Java applications primarily, my recent accomplishments include leading a large team involved in modernizing a distributed Geospatial analytics platform to a Cloud native architecture using Microservices, Microsoft Azure, Postgres Citus & Databricks. When I am not hacking away at the Keyboard, I enjoy spending time with my family & friends, binge watching shows on Netflix and cooking Plant based whole food meals.
I work as a data engineer with 14 years of experience in software development. My primary skills are building data centric applications in Oracle, SQL Server & PostgreSQL along with Business intelligence applications. In my current role in Guy carpenter for the last 9 years, my job involves supporting real-time analytics applications using Geo spatial computations in PostGIS. Some of the other things I am deeply involved are developing high throughput data pipelines & ETL applications in Azure Cloud using Azure Data Factory & Azure Databricks. Outside of work I love spending time with my family & friends.
The optimizer is the "brain" of the database, interpreting SQL queries and determining the fastest method of execution. This talk uses the explain...
The optimizer is the "brain" of the database, interpreting SQL queries and determining the fastest method of execution. This talk uses the explain command to show how the optimizer interprets queries and determines optimal execution. The talk will assist developers and administrators in understanding how Postgres optimally executes their queries and what steps they can take to understand and perhaps improve its behavior.
Bruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. He has been employed by EDB since 2006. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and Concepts, published by Addison-Wesley. Prior to his involvement with PostgreSQL, Bruce worked as a consultant, developing custom database applications for some of the world's largest law firms. As an academic, Bruce holds a Masters in Education, an honorary doctorate, was a high school computer science teacher, and lectures internationally.
In this wrap-up to the Americas Livestream for Citus Con: An Event for Postgres, hosts Marco Slot and Claire Giordano look back at some of the...
In this wrap-up to the Americas Livestream for Citus Con: An Event for Postgres, hosts Marco Slot and Claire Giordano look back at some of the repeated database themes in the 6 Citus Con talks in this livestream—including pg_stat_statements, integer to BIGINT conversions, database tuning, and visibility into Postgres operations with eBPF. Marco and Claire also chat about what to expect from all the other Postgres talks in Citus Con, in both of the other livestreams as well as in the 20 on-demand talks.
In the latter half of this wrap-up there are cool highlight reels showing 20 to 30 second video snippets for each of the 20 Citus Con on-demand talks—covering topics like Citus internals, time series data, MobilityDB and PostGIS, Postgres monitoring, observability with pg_stat_statements, data modeling, Django and psycopg, strings in Postgres, Postgres auditing on Azure, optimizing Citus queries at Heap, scaling SaaS applications to billions of events, data types, the Azure Database for PostgreSQL managed service, learnings from interviewing the PG community, and automated knob tuning.
Marco Slot is a Principal Software Engineer on the Citus team at Microsoft and is the lead engineer on the Citus extension. He has been working on PostgreSQL extensions including Citus and pg_cron since 2014 when he joined Citus Data. Prior to Citus Data, Marco did a PhD in cooperative self-driving cars at Trinity College Dublin and helped build CloudFront and Route 53 at Amazon Web Services.
Claire Giordano is a Principal PM Manager on the Postgres team at Microsoft, where she serves as head of Citus open source community initiatives. Prior to the Microsoft acquisition of Citus Data in 2019, Claire was VP of marketing at Citus—and she led the team to raise awareness about the Citus extension to Postgres.
Claire has served in leadership roles in engineering, product management, and product marketing at Citus Data, Sun Microsystems, and A9.com, an Amazon company. At Sun, Claire managed the engineering team that created Zones, and led the effort to open source the Solaris operating system. Claire earned an Sc.B. in Applied Mathematics & Computer Science from Brown University and started her career in tech working on developer tools at Sun.
Building a database comes with many questions: Do you create it from scratch, or do you fork an existing database like PostgreSQL or MySQL? What if...
Building a database comes with many questions: Do you create it from scratch, or do you fork an existing database like PostgreSQL or MySQL? What if there were a different way, not used before? (Hint: we called that “unforking from Postgres.”) How do you integrate with all the pieces that make a database work—the ecosystem of people, community, tooling, expertise? What is the role of open source? The cloud? How do you set yourself apart? Why build a database in the first place?
These are some of the questions we faced when we created Citus, as we added the superpowers of distributed tables to PostgreSQL.
Today, Citus has a thriving community of users, driving over a million open source installs per year and powering critical applications for Fortune 500 companies and startups alike—both on and off Azure. In this keynote, we will explore how we navigated these questions, and the 8 keys that drove the growth of Citus. Set in the context of an industry evolving from Relational to NoSQL to the best-of-both; the answers also inform what's ahead for Citus and PostgreSQL.
Umur has been CEO of Citus Data from founding to its acquisition by Microsoft, creating with our team distributed PostgreSQL that runs at any scale. He currently leads the PostgreSQL product team at Azure, as we continue our mission to build the world's best database experience with Postgres, open source, and the cloud.
Many projects involve something resembling a job or message queue. Using a PostgreSQL database for this can be a reasonable choice in some cases. ...
Many projects involve something resembling a job or message queue. Using a PostgreSQL database for this can be a reasonable choice in some cases. However, it's not without complications. This talk looks at queue-like workloads in detail. The topics covered will include:
I am a PostgreSQL developer and committer based in New Zealand. I began working full time on PostgreSQL and related technologies about 7 years ago, first at EnterpriseDB and now Citus/Microsoft. Before that I worked with Unix and relational databases in the web, finance and software industries for a couple of decades. Some of my PostgreSQL interests include query parallelism, taming resource management, transaction machinery, portability, and modernizing database/operating system interfaces. My other interests include hacking on the FreeBSD operating system, trying to learn other languages and trying to ride various things with wheels.
Postgres is growing like gangbusters: in popularity, in adoption, and in the size of the ecosystem. And over 650 developers contribute code to...
Postgres is growing like gangbusters: in popularity, in adoption, and in the size of the ecosystem. And over 650 developers contribute code to Postgres: their expertise, dedication, and skill are big factors in the increasing popularity of Postgres. But what if you’re not a developer: are there things you can do to help the Postgres community? Or what if you are a developer, and you love this project, and you want to do even more: are there non-code ways to contribute to Postgres?
In this updated version of the talk I gave in the PostgreSQL devroom at FOSDEM 2020, I’ll walk through 18 important ways you can contribute to Postgres, beyond code—along with tips and resources for getting started.
Claire Giordano is a Principal PM Manager on the Postgres team at Microsoft, where she serves as head of Citus open source community initiatives. Prior to the Microsoft acquisition of Citus Data in 2019, Claire was VP of marketing at Citus—and she led the team to raise awareness about the Citus extension to Postgres.
Claire has served in leadership roles in engineering, product management, and product marketing at Citus Data, Sun Microsystems, and A9.com, an Amazon company. At Sun, Claire managed the engineering team that created Zones, and led the effort to open source the Solaris operating system. Claire earned an Sc.B. in Applied Mathematics & Computer Science from Brown University and started her career in tech working on developer tools at Sun.
In this session Anthony Shaw will show you how to attack Postgres servers using an open-source tool called Hathi. Hathi is used to identify...
In this session Anthony Shaw will show you how to attack Postgres servers using an open-source tool called Hathi. Hathi is used to identify insecure configurations of Postgres and fix them. Think your server is secure? We'll see!
Outline:
Anthony is from Sydney, Australia and is a contributor to many open-source communities. Running and contributing to several popular open-source tools for DevOps, Security, Automation and Code Quality. He has been recognized for his contribution to open source, including as Fellow of the Python Software Foundation and member of the Apache Software Foundation. Anthony runs a Python blog and YouTube channel and has recently published a book on the Python compiler.
Do you know how many languages on this planet? How about top 10 languages used on the internet? Is English the language that has the largest...
Do you know how many languages on this planet? How about top 10 languages used on the internet? Is English the language that has the largest population as their mother tongue? From the perspective of Diversity and Inclusion, we share the common sense about the importance of multi language support. As other major relational database management systems do so, PostgreSQL also provides localization features, specifically locale support, collation support and character set support. Let's learn them and put PostgreSQL to use for empowering every person and organization on the planet to achieve more.
Keisuke Takahashi is a Cloud Solution Architect (Data&Analytics) at Microsoft. Keisuke has been with Microsoft since 2021 and is currently responsible for encouraging manufacturing industries in Japan technically to achieve DX step-by-step by introducing Azure. Keisuke was a leader of software development and data science in his former job. Keisuke has been with Open Source Software for 24 years and experienced using PostgreSQL in his projects.
Odyssey is the advanced multi-threaded PostgreSQL connection pooler and request router. And in the new release 1.3 we have focused on new features...
Odyssey is the advanced multi-threaded PostgreSQL connection pooler and request router. And in the new release 1.3 we have focused on new features for connection poolers: standby lag polling and transaction pooling for prepared statements. In this session, we will discuss what challenges we met on a road to these features, their limitations, and the workloads they unlock.
Hacking on Postgres since 2016. Associated professor at Yandex School for Data Analysis and Ural Federal University.
In this unscripted wrap up at the end of the APAC livestream at Citus Con: An Event for Postgres, livestream host Aaron Wislang, plus event...
In this unscripted wrap up at the end of the APAC livestream at Citus Con: An Event for Postgres, livestream host Aaron Wislang, plus event co-chair Claire Giordano and Postgres committer Thomas Munro discuss the talks in the APAC livestream plus what you can expect from all 38 talks in the Postgres event. Thomas also explained a bit about how the Postgres developer community uses mailing lists. And the topic of the PostgreSQL extension APIs came up too.
This wrap up also includes highlight reels with 20 to 30 second videos for each of the 20 Citus Con on-demand talks.
Aaron Wislang is a Senior Cloud Developer Advocate at Microsoft with over a decade of software development, systems architecture, and security experience across the major clouds, development platforms, and industries. His current areas of focus include all things open source, cloud native, Python, Go, and even Postgres. He joined Microsoft from Rackspace where he was a Senior Systems Architect & Software Developer on the Azure Product & Engineering team. Aaron and his wife currently live in Toronto, Canada with their two boys and their turtle.
Claire Giordano is a Principal PM Manager on the Postgres team at Microsoft, where she serves as head of Citus open source community initiatives. Prior to the Microsoft acquisition of Citus Data in 2019, Claire was VP of marketing at Citus—and she led the team to raise awareness about the Citus extension to Postgres.
Claire has served in leadership roles in engineering, product management, and product marketing at Citus Data, Sun Microsystems, and A9.com, an Amazon company. At Sun, Claire managed the engineering team that created Zones, and led the effort to open source the Solaris operating system. Claire earned an Sc.B. in Applied Mathematics & Computer Science from Brown University and started her career in tech working on developer tools at Sun.
I am a PostgreSQL developer and committer based in New Zealand. I began working full time on PostgreSQL and related technologies about 7 years ago, first at EnterpriseDB and now Citus/Microsoft. Before that I worked with Unix and relational databases in the web, finance and software industries for a couple of decades. Some of my PostgreSQL interests include query parallelism, taming resource management, transaction machinery, portability, and modernizing database/operating system interfaces. My other interests include hacking on the FreeBSD operating system, trying to learn other languages and trying to ride various things with wheels.
PostgreSQL has been around a long time. After 3 decades, some people might even say it’s getting old. But in today’s fast-moving IT world,...
PostgreSQL has been around a long time. After 3 decades, some people might even say it’s getting old. But in today’s fast-moving IT world, PostgreSQL is more relevant than ever. From my vantage point as a PostgreSQL consultant, code contributor, and as a member of the PostgreSQL core team, I’ll share lessons I’ve learned from PostgreSQL open source users and developers alike. In this talk, we’ll look at some examples of why and how PostgreSQL is more relevant than ever.
Magnus Hagander is a member of the PostgreSQL Core Team and a developer and code committer in the PostgreSQL Global Development Group.
Magnus is one of the original developers of the Windows port of PostgreSQL. These days, he mostly works on other parts of the PostgreSQL backend, recently with a focus on security features, monitoring and backup/replication interfaces and tools.
He's been a PostgreSQL user since version 6 (with some non-serious use of Postgres 95 before that), and currently serves on the Core Team and as President of the Board for PostgreSQL Europe.
To pay the bills, he is a PostgreSQL and open source software consultant at Redpill Linpro in Stockholm, Sweden, where he works on consulting, support and training services, as well as custom development work.
In the talk I want to present a few lesser known, but useful features you may never heard about! For example, do you know how you can get the...
In the talk I want to present a few lesser known, but useful features you may never heard about! For example, do you know how you can get the number of inserted and update rows in an upsert? how to create reproducible random data for testing and demonstrations? match a text against multiple patterns without complicated condition? How about using \copy with multi-line queries? All that and more... in my talk :)
A tech lead specializing in databases, web development and performance tuning. Check out hakibenita.com
At Algolia, we were long time customers of Citus Cloud for our multi-region analytics pipeline (~5TB of data). We chose to migrate to HyperScale...
At Algolia, we were long time customers of Citus Cloud for our multi-region analytics pipeline (~5TB of data). We chose to migrate to HyperScale end of 2021. As the analytics stack is a key part of our product, our goal was to have the least downtime possible on our infrastructure. Hence, we designed a migration plan to provide a seamless migration to our users.
What's included in this talk:
What's not included in this talk:
Hello there! My name is Antoine and I am a passionate Software Engineer. In my spare time you can find me gardening or at a rock concert.
Hi, I'm Matthieu, a French software engineer at Algolia. I've been drawn into the web dev world when I was 12, and could never let it go since then. Other than this passion that I got to get a living from, I enjoy video-games, roller skating & electronic music.
Do you want to migrate away from Oracle but unsure where to start? Are you afraid that the migration effort will be significant and too big to take...
Do you want to migrate away from Oracle but unsure where to start? Are you afraid that the migration effort will be significant and too big to take on? We know how hard it is to start, we’ve been there and helped many customers to go through this process with lots of lesson learnt. Migrating from Oracle shouldn’t be so difficult. We want to announce a brand new tool OSSom (OSS Open Migrator) that provides an intuitive and simple wizard to generate the migration assessment of your database. OSSom will tell you:
And that’s all presented in a very simple and concise way.
Alicja, currently working at Microsoft as EMEA Global Black Belt OSS Data tech Specialist - is a PostgreSQL expert, both an experienced developer as well as administrator and PostgreSQL coach with strong practical knowledge of Linux and their mutual interactions. Particularly interested in performance optimization at different levels. She has consulted a lot of companies, mainly in Poland providing them with working solutions and supporting them in architecting, deploying and maintaining PostgreSQL.
Diaa Radwan is part of the Global Blackbelt team focusing on Open Source databases at Microsoft. He has been supporting and enabling companies across different industry verticals to adopt Open Source technologies in the past 15 years.
Azure Database for PostgreSQL Flexible Server was recently made generally available. It represents a new generation of managed database service...
Azure Database for PostgreSQL Flexible Server was recently made generally available. It represents a new generation of managed database service built on Linux and contains new capabilities like zone redundant high availability. This session talks about the new capabilities of Flexible Server and shares hands-on experiences and lessons learned introducing it. It covers deployment using Terraform in Azure DevOps Pipelines and highlights the differences compared to Azure Database for PostgreSQL Single Server. Network and security topics and experiences with the new burstable compute tier are further discussed.
Johannes Schuetzner is a software engineer for MB.OS (Mercedes-Benz Operating System) with extensive experience in database technology and Cloud architectures. After having worked in the enterprise application space, he has shifted focus to Cloud-native solutions for connected vehicles. He is currently developing Cloud applications on Azure and Postgres.
This talk will discuss what to expect from the upcoming Citus 11 release. One of the big features is enhanced query throughput scalability by...
This talk will discuss what to expect from the upcoming Citus 11 release. One of the big features is enhanced query throughput scalability by enabling queries from any node. We will show how you can take full advantage of a Citus 11 cluster to scale query throughput, describe other new features we are adding, and share some very high throughput benchmark numbers.
Marco Slot is a Principal Software Engineer on the Citus team at Microsoft and is the lead engineer on the Citus extension. He has been working on PostgreSQL extensions including Citus and pg_cron since 2014 when he joined Citus Data. Prior to Citus Data, Marco did a PhD in cooperative self-driving cars at Trinity College Dublin and helped build CloudFront and Route 53 at Amazon Web Services.
Wrap up post-show discussion of all the 18 talks across the 3 livestreams, plus the 20 on-demand talks at Citus Con: An Event for Postgres. Hosted...
Wrap up post-show discussion of all the 18 talks across the 3 livestreams, plus the 20 on-demand talks at Citus Con: An Event for Postgres. Hosted by Claire Giordano and Marco Slot. In this unscripted session, Claire and Marco chit chat about what to expect in all these 38 Citus Con talks, touching on topics like pg_cron, the return of in person events, and the pithy quote from Magnus Hagander’s keynote that: “nobody needs Postgres but everybody runs it.”
Marco and Claire also discussed the possibilities for more database automation in the future, perhaps triggered by Andy Pavlo’s keynote in the Americas livestream about the building blocks for self-driving PostgreSQL. And the importance of autovacuum, which Samay Sharma shines a light on in his on-demand talk. And then there are all those useful Postgres tips in Haki Benita’s “lesser known features of Postgres.”
This wrap up also includes highlight reels with 20 to 30 second videos for each of the 20 Citus Con on-demand talks:
Marco Slot is a Principal Software Engineer on the Citus team at Microsoft and is the lead engineer on the Citus extension. He has been working on PostgreSQL extensions including Citus and pg_cron since 2014 when he joined Citus Data. Prior to Citus Data, Marco did a PhD in cooperative self-driving cars at Trinity College Dublin and helped build CloudFront and Route 53 at Amazon Web Services.
Claire Giordano is a Principal PM Manager on the Postgres team at Microsoft, where she serves as head of Citus open source community initiatives. Prior to the Microsoft acquisition of Citus Data in 2019, Claire was VP of marketing at Citus—and she led the team to raise awareness about the Citus extension to Postgres.
Claire has served in leadership roles in engineering, product management, and product marketing at Citus Data, Sun Microsystems, and A9.com, an Amazon company. At Sun, Claire managed the engineering team that created Zones, and led the effort to open source the Solaris operating system. Claire earned an Sc.B. in Applied Mathematics & Computer Science from Brown University and started her career in tech working on developer tools at Sun.
Well folks, it is now 2022 and we are still dealing with integer overflow problems. Just a few months ago I came within 24 hours of watching one of...
Well folks, it is now 2022 and we are still dealing with integer overflow problems. Just a few months ago I came within 24 hours of watching one of the worlds tech unicorns come to a stop because of a possible overflow. We all know what the problem is but what we need to know is how to avoid it, in production, at scale, when it is already too late. So join me as I discuss some of the more advanced work we did to get out of the mess in front of us and help ensure we didn't see problems elsewhere in the chain; including dealing with other data types and on-the-fly schema change needs as well.
Having spent his early years building data-intensive, real-time systems within the Fortune 500, Robert now works as a Technical Fellow for Instaclustr. A published author and speaker at conferences worldwide, Robert is a recognized industry expert on topics including databases, DevOps, and Open Source. He occasionally blogs at https://xzilla.net.
In the session, we are gonna present MobilityDB and how it integrates with Citus to support big spatiotemporal data management in PostgreSQL....
In the session, we are gonna present MobilityDB and how it integrates with Citus to support big spatiotemporal data management in PostgreSQL. MobilityDB is an open-source moving object database system that extends PostgreSQL and PostGIS with temporal and spatiotemporal types and operations. Its core function is to efficiently store and query mobility tracks, such as vehicle GPS trajectories. MobilityDB is an OSGeo community project (https://www.osgeo.org/projects/mobilitydb/).
The session will include the following:
Mohamed Bakli is a big data researcher and mobility data scientist. He is a member of the MobilityDB Project Steering Committee and the development team, where he specializes in big spatiotemporal data management and query optimization. He has contributed to the development of multiple parts including integration with Citus, query optimizer, data cleaning, Python adapter, benchmarking, and Docker images. Mohamed also has good hands-on experience in Big Data Technologies (Hadoop ecosystem, Spark, HDP3, PrestoDB, MongoDB) and cloud computing (Microsoft Azure, Digital Ocean, Hetzner, AWS). Besides, he has various publications in good venues such as MDM, SigSpatial, SSTD, BigSpatial, SpatialAPI, and J. Geogr Syst.
Data is at the core of our business and our business has been growing incredibly fast. To keep up with the pace, the scalability and reliability of...
Data is at the core of our business and our business has been growing incredibly fast. To keep up with the pace, the scalability and reliability of our database was critical. PostgreSQL 13 flex server overdelivered on both.
With Postgres at the core of our technology landscape, we were enabled to continuously onboard an increasing number of clients without ever to worry about performance and reliability. Postgres is for us the place to store all our processed and meta data to make it accessible to our data analysts. The many possibilities on such types of databases, like postgres_fdw and trigger functions and calculated columns enable us to enrich client data and automate our dataflows.
The flex server of Azure PostgreSQL has been a partner that has gone with us through of this path of growth, constantly adapting to the changing landscape. We can safely say that the Azure PostgreSQL server will continue to serve as one of the cornerstones of our data architecture.
Bob Wuisman – Data Operations Director Digital Innovation Centre
Building on his years of experience as a Business Process consultant, Bob has successfully built business intelligence environments in various businesses. With a holistic vision and process driven mindset, Bob thrives on building teams and sustainably growing data driven operations.
Technical Competences:
Head of Data Engineering, Machine Learning Engineer, Mechanical Engineer, MSc. Mathematical Modelling, Ph.D. candidate in Math and Informatics, Author of Packt book "Distributed Data Systems with Azure Databricks".
Using a database as powerful as PostgreSQL challenges us to use best practices to get the best performance within the expected cost... But two...
Using a database as powerful as PostgreSQL challenges us to use best practices to get the best performance within the expected cost... But two aspects that are often overlooked is data modeling and governance.
The PostgreSQL or Citus data projects that start right are the ones that will succeed! Choosing the best indexes and the best data types is critical to success—and if your application runs on Azure, choosing the best scaling of these services on Azure matters too! And if your application already exists, and is running on top of PostgreSQL, what improvements are feasible? Let's talk about real-life tuning.
If we put this database triad together (best practices, data modeling, and data governance) it is practically impossible for your project not to be a great success! Join me in this talk and let's go together towards successful projects.
Master in Computer Engineering, TEDx Speaker, passionate about data and databases. After years as a developer, she was already a DBA, architect and data engineer, now she is CSA (Data and AI) at Microsoft. She won the awards:
International speaker, she was the first Brazilian woman to speak at MongoDB World (New York), lectured at Oracle Code One (San Francisco), and PHP Benelux (Belgium). In addition, she is an official LinkedIn Learning instructor, having recorded (in Austria) three courses for the platform.
You are tasked to build a large scale data warehouse. After much reading and listening, you pick the best tech, and load up with data. Fun and...
You are tasked to build a large scale data warehouse. After much reading and listening, you pick the best tech, and load up with data. Fun and profit from now on? In this talk, I will reflect my journey with VeniceDB how to scale a data system over years from data modeling perspective. I will cover the types of data and various query design techniques to meet the business requirements of Windows Telemetry.
Min Wei has been spending most of his career in the data space like Exchange content management, Hadoop/Hive system, Postgres and ClickHouse. For the past 5 years he has been passionate about building and operating very large scale data warehouses.
Every time we’re going to create a new project with Django we make assessments on its requirements to choose the best architecture, of which, the...
Every time we’re going to create a new project with Django we make assessments on its requirements to choose the best architecture, of which, the database is usually the core.
Django is a database-agnostic web framework but natively supports only 4 Open Source databases: PostgreSQL, SQLite, MariaDB and MySQL.
PostgreSQL has the richest feature set of any supported database and some of these features are natively supported directly in Django via its contrib module.
In this talk we’ll see how to use to our advantage the features of PostgreSQL as a database in Django, its exclusive features present in its contrib module and also other superpowers that can be exploited through the use of third-party packages.
I’m Paolo Melchiorre, a longtime Python backend developer who contributes to the Django project and gives talks at tech conferences.
I’ve been a GNU/Linux user for over 20 years and I use and promote Free Software.
I graduated in Software Engineering and I’m an alumnus of the University of Bologna, Italy.
I’ve been working in the web for 15 years and now I’m the CTO of 20tab, a pythonic software company, for which I work remotely.
Strings are one of the most used types in databases; they can store pretty much any data and don't enforce any rules on the inserted input. Yet too...
Strings are one of the most used types in databases; they can store pretty much any data and don't enforce any rules on the inserted input. Yet too much freedom sometimes leads to inconsistencies: is it Aivan or Aiven? Øyvind or Oyvind? Wine or Whine? These seemingly small differences can have bad side-effects, causing lookups to fail and incorrect aggregation results to be returned. Luckily all is not lost: PostgreSQL has some features that can help us make sense of the chaos.
In this talk you will learn what PostgreSQL has to offer: starting with pattern matching, passing by regular expressions, and ending with more advanced functionality exposed by the fuzzystrmatch and unaccent extensions. I'll demonstrate what tools can help you fixing string inconsistencies and how to avoid making the same mistakes again in the future. This session is recommended for anyone who deeply cares about their (string) data quality.
Francesco comes from Verona, Italy and works as a Developer Advocate at Aiven. With his many years of experience as a data engineer, he has stories to tell and advice for data-wranglers everywhere. Francesco loves sharing knowledge with others as a speaker and writer, and is on a mission to defend the world from bad Italian food!
Citus is a PostgreSQL extension that transforms Postgres into a distributed database. For some users, how this transformation happens is not...
Citus is a PostgreSQL extension that transforms Postgres into a distributed database. For some users, how this transformation happens is not obvious. In this talk, I'd like to walk you through Citus internals while a SELECT query is processed. In the end of the talk, you'd gain more insights on the internals of Citus.
Are you tired of not knowing who's changing what in your Azure PostgreSQL databases?
To audit Azure Database for PostgreSQL, you can use...
Are you tired of not knowing who's changing what in your Azure PostgreSQL databases?
To audit Azure Database for PostgreSQL, you can use pgaudit. With auditing, you can track changes, report on usage to auditors, or change user permissions/passwords without breaking functionality.
This session will include an overview, how to set up pgaudit, and a quick demo. Auditing is a key part of any proper database setup, and with this session, you can be an auditing pro in no time!
Josephine Bush has over 10 years of experience as a Database Administrator. Her experience is extensive and broad-based, including in financial, business, and energy data systems using SQL Server, MySQL, Oracle, and PostgreSQL. She is a Microsoft Certified Solutions Expert: Data Management and Analytics. She holds a BS in Information Technology, an MBA in IT Management, and an MS in Data Analytics. She is the author of Learn SQL Database Programming published by Packt in May 2020. She blogs on sqlkitty.com and you can reach her on Twitter @hellosqlkitty.
Managing time series data at scale can be a challenge. PostgreSQL offers many powerful data processing features such as indexes, COPY and SQL—but...
Managing time series data at scale can be a challenge. PostgreSQL offers many powerful data processing features such as indexes, COPY and SQL—but the high data volumes and ever-growing nature of time series data can cause your database to slow down over time.
Fortunately, Postgres has a built-in solution to this problem: Partitioning tables by time range. Partitioning with the Postgres declarative partitioning feature can help you speed up query and ingest times for your time series workloads. Though, you’ll still be limited by the memory, CPU, and storage resources of your Postgres server.
The good news is you can scale out your partitioned Postgres tables to handle enormous amounts of data by distributing the partitions across a cluster using Citus.
This talk will guide you to using Postgres with Citus (and pg_cron) for time series data—effectively transforming PostgreSQL into a distributed time series database.
Software engineer at Microsoft. Interested in distributed systems, machine learning, analytics, and anything related. Former researcher with MSc in brain decoding. Football player. Fan of classical music.
This talk will be a mashup of the lessons described in a few recent, popular Heap technical blog posts on this subject. We'll cover
This talk will be a mashup of the lessons described in a few recent, popular Heap technical blog posts on this subject. We'll cover
I’m a wannabe philosophy professor turned wannabe tech entrepreneur living in Orlando, FL.
I’ve spent most of my professional career writing android code, but I’ve also written quite a bit of javascript and a little bit of python, go, and ruby. Currently, I'm at Heap on the data science team working in R and Postgres SQL. I've spent a lot of time trying to make our Citus queries fast.
How we scaled ConvertFlow's platform to 40 million visitors per month and 8+ billion events processed, using Postgres and Azure.
How we scaled ConvertFlow's platform to 40 million visitors per month and 8+ billion events processed, using Postgres and Azure.
I'm a 26 year old developer from Chile, living in Miami, FL.
Co-founded ConvertFlow, a no-code visitor conversion platform that helps marketing teams at brands, such as Volkswagen, Nectar and Talkspace, scale their marketing workflows without waiting on developers.
Before starting ConvertFlow and taking it through Techstars in 2016, my co-founder and I ran a marketing agency where we provided conversion rate optimization and email marketing services to brands.
Citus’s ability to parallelize write operations is awesome, but it’s not always easy to take it to the extreme. This talk starts with the basics of...
Citus’s ability to parallelize write operations is awesome, but it’s not always easy to take it to the extreme. This talk starts with the basics of loading data and when to use \COPY versus single-row inserts, then goes into more advanced topics. Microbatching \COPY operations, a case study in data ordering, tutorials on direct worker writes, and a discussion of distributed triggers will all be covered in this talk.
Colton Shepard is a Citus and Microsoft alumni, having spent several years as a Solutions Engineer onboarding and migrating Citus customers. He now works at TRM Labs, a blockchain intelligence company that helps prevent fraud and financial crime.
From data types that lose precision with complex math, to dealing with inconsistent support for various data types in older versions and forks,...
From data types that lose precision with complex math, to dealing with inconsistent support for various data types in older versions and forks, choices of data type can have unforeseen consequences down the road. We'll shine a light on a small selection of such issues.
Renee is a career changer who entered tech after a background in Medical Anthropology, which she used to help a variety of businesses streamline their processes. Constantly intrigued by data collection, storage, and management, she ventured into PostgreSQL and hasn't looked back. She brings her previous non-technical experience as well as her work doing QA for a startup to the stage.
If you have run PostgreSQL for any serious OLTP workload, you have heard of autovacuum. Autovacuum is PostgreSQL’s way of running vacuum regularly...
If you have run PostgreSQL for any serious OLTP workload, you have heard of autovacuum. Autovacuum is PostgreSQL’s way of running vacuum regularly to clear bloat from your tables and indexes. However, in spite of having autovacuum on, a large number of PostgreSQL users still see their database bloat increasing. What’s going on?
In the last decade, I have personally worked with 50+ Postgres customers who have struggled to figure out why autovacuum isn’t working how they expect. In this talk, we will walk through what I’ve learned from analyzing and improving these production Postgres databases. In this talk you will learn how autovacuum works, how to figure out why it is not working as you expect, and what you can do to fix it.
Samay is a principal engineering manager in the PostgreSQL team at Microsoft. He has been working with PostgreSQL for almost a decade (at Microsoft and at Citus Data prior to that) as an extension developer, solutions engineer, and an ardent fan of PostgreSQL. Over the last few years, he has been working directly with PostgreSQL customers to improve and optimize their databases. He has a keen interest in making it easier for users to understand PostgreSQL performance.
Do you know which queries are acting abnormally today vs. yesterday? Which queries are fast but running 100,000 times per hour? Are there certain...
Do you know which queries are acting abnormally today vs. yesterday? Which queries are fast but running 100,000 times per hour? Are there certain times per day that performance lags unexpectedly?
The pg_stat_statements extension is our most valuable tool for understanding the current state of query workloads within your PostgreSQL cluster. Unfortunately, all of the tracked metrics are cumulative until they are reset (either manually or with a restart), making it difficult to use for point-in-time tuning and observability.
In this talk, I'll review the metrics that pg_stat_statements provides and then demonstrate how to save the data to a table periodically for better visibility into your queries' performance and resource usage over time, including sample Grafana dashboards. We'll conclude the talk by discussing the additional benefits of storing this data in a TimescaleDB hypertable, which provides native compression (store more data longer) and automatic data retention policies.
Ryan is a Developer Advocate at Timescale, the supercharged time-series database built on PostgreSQL. Prior to Timescale, Ryan worked for more than 17 years as a developer, DBA and product manager in multiple ISVs delivering SaaS products based on time-series data.
Ryan is a long-time DBA, starting with MySQL and Postgres in the late 90s. He spent more than 15 years working with SQL Server and the #SQLFamily and has a desire to bring some of that community spirit into the growing PostgreSQL world. He’s at the top of his game when he's learning something new about the data platform or teaching others about the technology he loves.
The newly GA’d Azure Database for PostgreSQL managed service is called “Flexible Server” but what does that mean? What is the new “Flexible Server”...
The newly GA’d Azure Database for PostgreSQL managed service is called “Flexible Server” but what does that mean? What is the new “Flexible Server” option for Postgres and why should you care? In this talk from the one of the Azure Database for PostgreSQL product experts at Microsoft, you’ll walk through a high-level overview of the architecture and the 10 reasons to choose Flexible Server for Postgres on Azure.
What are the 10 reasons to consider Flexible Server? This talk covers Flexible Server’s relationship with Linux, lower latency, performance, zone resilient HA, custom maintenance window, lower cost, on-demand stop/start, builtin connection pooling, richer control of server configuration and telemetry. Spoiler alert: Flexible Server shows very comparable performance for PostgreSQL workloads migrated from on-premises/VMs, while also taking the operational complexity out of your equation by running on a managed service.
Sunil is a seasoned professional with 30+ years of industry experience in databases. He leads to the program manager team for Azure database for PostgreSQL. Previously, Sunil has worked as developer/lead building relational database engine technology, as a tool designer/lead building backup/restore tools, as an application developer/manager enabling B2B commerce and e-learning, and as a principal program manager for SQL Server.
PostgreSQL provides many metrics and information with a substantial amount of documentation regarding monitoring. In my work as a developer for the...
PostgreSQL provides many metrics and information with a substantial amount of documentation regarding monitoring. In my work as a developer for the Citus extension to Postgres, and in my collaboration with the engineers who support our managed Postgres service on Azure, I’ve learned that designing monitoring dashboards and monitoring the most necessary metrics can be tricky.
In this talk, I will walk you through my own developer journey, as I figured out how to create better monitoring dashboards that keep an eye on what’s happening behind the scenes with the Hyperscale (Citus) database clusters. I’ll cover the details of what I learned about Postgres metrics and Postgres logs that can give you insights about performance, reliability, and security.
Sena Gungor is a software engineer on the Postgres team at Microsoft, focused on the Citus extension to Postgres and on database monitoring tools. Sena earned her B.S. degree in Computer Engineering at METU. A certified plant geek, Sena’s desk is surrounded by weeping figs, snake plants, and bonsai. Outside of work, Sena is a big fan of vinyasa yoga—and Postgres of course.
Since early 2020 I'm publishing weekly interviews with members of the PostgreSQL community. So far (February 2022) that's 97 interviews - with many...
Since early 2020 I'm publishing weekly interviews with members of the PostgreSQL community. So far (February 2022) that's 97 interviews - with many more to come. And we all learned a lot from these!
https://postgresql.life/
Just to name a couple examples:
The interview answers show some commonalities, but also show very different aspects - as example how people treat their development environments. This talk looks at the answers in the interviews and summarizes what the members of the community think, like and dislike. Statistics and quotes will round up the talk.
Andreas Scherbaum is working with PostgreSQL since 1997. He is involved in several PostgreSQL related community projects, member of the Board of Directors of the European PostgreSQL User Group and also wrote a PostgreSQL book (in German).
Since 2011 he was working for EMC/Greenplum/Pivotal and tackled very big databases. Nowadays he does the same - but with maybe even more and bigger databases - for Adjust GmbH in Berlin.
Database management systems (DBMS) expose dozens of configurable knobs that control runtime behavior. Setting these knobs correctly for an...
Database management systems (DBMS) expose dozens of configurable knobs that control runtime behavior. Setting these knobs correctly for an application's workload can improve the performance and efficiency of the DBMS. But such tuning requires considerable efforts from experienced administrators, which is not scalable for large DBMS fleets.
OtterTune is an automated database tuning service that uses machine learning to generate optimized DBMS knob configurations. OtterTune observes the DBMS's workload through its metrics and then trains recommendation models that select better knob values.
In this talk, I will present the lessons learned from deploying OtterTune for real-world PostgreSQL databases. In particular, this talk will highlight why PostgreSQL makes it easier to use an ML-powered tuning service easier than MySQL.
Dana Van Aken is co-founder and CTO of OtterTune (https://ottertune.com). She recently completed her Ph.D. in Computer Science at Carnegie Mellon University. Her research interests lie at the intersection of autonomous database technology and machine learning.
Subscribe to notifications to keep up with Citus Con news
Twitter
Join the conversation #CitusCon
Digital events have an environmental impact too. Citus Con: An Event for Postgres is estimated to produce about 1.033 metric tons of CO2 in 2022, including both the event’s production and attendees streaming live and on-demand. We’re partnering with Tradewater to fully offset the carbon emissions from the Citus Con event in 2022 and then more—becoming carbon negative! Tradewater is a mission-based company focused on the collection and destructions of the most potent greenhouse gases ever made to help prevent a climate crisis. Visit their website to understand your impact by calculating your carbon footprint and join the fight against climate crisis through the purchase of high impact offset credits.
The Postgres and Citus team at Microsoft is proud to be the host of Citus Con: An Event for Postgres.