<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Citus Data Blog - Articles by Jason Petersen</title>
  <author>
    <name>Jason Petersen</name>
  </author>
  <subtitle>Scaling data and analytics with Postgres</subtitle>
  <id>https://www.citusdata.com/blog/</id>
  <link href="https://www.citusdata.com/blog/"/>
  <link href="https://www.citusdata.com/blog/feed/jason-petersen.xml" rel="self"/>
  <updated>2017-10-25T19:47:00+00:00</updated>
  <entry>
    <title>What it means to be a Postgres extension</title>
    <link rel="alternate" href="https://www.citusdata.com/blog/2017/10/25/what-it-means-to-be-a-postgresql-extension/"/>
    <id>https://www.citusdata.com/blog/2017/10/25/what-it-means-to-be-a-postgresql-extension/</id>
    <published>2017-10-25T19:47:00+00:00</published>
    <updated>2017-10-25T19:47:00+00:00</updated>
    <author>Jason Petersen</author>
    <content type="html">&lt;p&gt;Nearly 18 months ago, we open sourced our &lt;a href="https://github.com/citusdata/citus"&gt;Citus distributed database&lt;/a&gt; and &lt;a href="/blog/2016/03/24/citus-unforks-goes-open-source/"&gt;&amp;quot;unforked it&amp;quot;&lt;/a&gt; from PostgreSQL by refactoring Citus into a PostgreSQL extension. Seasoned PostgreSQL users likely already know of and use popular PostgreSQL extensions, such as &lt;a href="/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/"&gt;hstore&lt;/a&gt;, PostGIS, and &lt;a href="/2013/01/10/more-on-postgres-performance/"&gt;pg_stat_statements&lt;/a&gt;; however, we realized some of you might appreciate a recap of our journey from fork to extension and beyond. &lt;/p&gt;

&lt;h2&gt;Before Becoming a PostgreSQL Extension&lt;/h2&gt;

&lt;p&gt;Prior to the release of Citus 5.0 in the spring of 2016, our codebase was best described as a fork of PostgreSQL. The open source license used by PostgreSQL is quite liberal and generously allows for forks, which has led to a &lt;a href="https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases"&gt;long history of systems being built on top of PostgreSQL&lt;/a&gt;, including ParAccel, Truviso, Aster Data, and Greenplum. PostgreSQL releases new major versions each year, which are generally backwards compatible from an end-user perspective, but can come with substantial changes in the lower-level PostgreSQL code.&lt;/p&gt;

&lt;p&gt;Being a fork of PostgreSQL means needing to adapt to those low-level changes by &lt;a href="https://git-scm.com/book/en/v2/Git-Branching-Rebasing"&gt;rebasing&lt;/a&gt; atop each new version after it is released, a complex integration process that can easily consume weeks.&lt;/p&gt;

&lt;h2&gt;Transforming Citus from a Fork to an Extension of Postgres&lt;/h2&gt;

&lt;p&gt;Fortunately, PostgreSQL exposes many internal hooks to permit the creation of extensions to expand its capabilities and power. Extensions can be developed against the low-level hooks using C or in higher-level procedural languages. The hooks give direct access to the core of PostgreSQL: scans, utility commands, planning, and execution are just a few of the processes that can be modified or entirely overridden by an extension. Extensions can provide new datatypes, better monitoring, foreign data wrappers, advanced security capabilities, and even entirely new languages for writing stored procedures!&lt;/p&gt;

&lt;p&gt;Our Citus database leverages a majority of the low-level PostgreSQL hooks available to us: we create custom nodes, use them to build custom scans, and have custom planner and executor lifecycles to actually carry out your distributed queries. We even override processing of DDL commands to help perform schema modifications on remote nodes and have recently added a background worker within our extension, which performs distributed deadlock detection on in-flight queries. That we can do all of this in a modular &amp;ldquo;add-on&amp;rdquo; fashion is a testament to the careful design of PostgreSQL&amp;#39;s internals.&lt;/p&gt;

&lt;h2&gt;Benefits of Being an Extension to PostgreSQL&lt;/h2&gt;

&lt;p&gt;Because we no longer maintain an entire fork of the complete PostgreSQL codebase, the effort required to remain compatible with new versions of PostgreSQL has been dramatically reduced: after a new PostgreSQL version is released, we need only integrate with changes to the interfaces we use to call out to PostgreSQL. These kinds of changes are often more along the lines of &lt;em&gt;a new parameter has been added to a method, fix the call sites&lt;/em&gt; rather than &lt;em&gt;an entire codebase of millions of lines has changed beneath you&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;With this reduction in integration effort, we have recently been able to begin supporting major PostgreSQL releases &lt;em&gt;before&lt;/em&gt; they land. This even includes support for entirely new features in upcoming releases: for instance, Citus 7 includes some awareness of PostgreSQL 10&amp;rsquo;s declarative partitioning syntax, even though Citus 7 shipped a full month before PostgreSQL 10 was released. In the past, when Citus was a fork of PostgreSQL, supporting a new PostgreSQL feature like that would have taken months of integration work.&lt;/p&gt;

&lt;h2&gt;Today: Citus + PostgreSQL = ❤️&lt;/h2&gt;

&lt;p&gt;All of this history leads us to where we are today with &lt;a href="/download/"&gt;Citus&lt;/a&gt;. We continue to push forward with new Citus releases chock-full of features to serve all our customers&amp;mdash;whether they be open-source, enterprise, or cloud&amp;mdash;making it so you don’t have to worry about your database and can get back to building features.&lt;/p&gt;

&lt;p&gt;Citus 7&amp;mdash;released in September&amp;mdash;was  our first release which supported a major PostgreSQL version on the &lt;em&gt;day&lt;/em&gt; it was released. As soon as the PostgreSQL 10 packages showed up in PGDG, we released OS packages for Citus that were compatible with the PostgreSQL 10 release. The next step: we immediately began builds against PostgreSQL 11 in our continuous integration environment, the earliest we have ever begun building against a new PostgreSQL version.&lt;/p&gt;

&lt;p&gt;Admittedly we&amp;rsquo;re biased when it comes to PostgreSQL: pretty much everyone on the engineering team at Citus Data is a PostgreSQL fan. Still, we&amp;#39;ve been impressed by the capabilities afforded us as an extension to the &amp;ldquo;world’s most advanced open source database.&amp;rdquo; And we look forward to the many more impressive features on the PostgreSQL roadmap ahead.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href='https://www.citusdata.com/blog/2017/10/25/what-it-means-to-be-a-postgresql-extension/'&gt;citusdata.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <title>Announcing pg_shard 1.2</title>
    <link rel="alternate" href="https://www.citusdata.com/blog/2015/07/30/announcing-pgshard-12/"/>
    <id>https://www.citusdata.com/blog/2015/07/30/announcing-pgshard-12/</id>
    <published>2015-07-30T00:00:00+00:00</published>
    <updated>2015-07-30T00:00:00+00:00</updated>
    <author>Jason Petersen</author>
    <content type="html">&lt;p&gt;pg_shard continues to gain momentum as a straightforward sharding
extension for PostgreSQL. We’re ecstatic each time we hear about a new
deployment and are always considering what’s next.&lt;/p&gt;

&lt;p&gt;We’ve been hard at work addressing some of those customer needs and have
released a new version of pg_shard this week with our latest efforts.
The changes in this release include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Full CitusDB integration&lt;/strong&gt; — Distribution metadata is &lt;em&gt;always&lt;/em&gt; in
sync with CitusDB&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Better type support&lt;/strong&gt; — Partition by enumeration or composite
types&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Planning improvements&lt;/strong&gt; — Improved internal metadata locking and
function checking&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Usability enhancements&lt;/strong&gt; — Better validations and error messages
during use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upgrading or installing is a breeze: see &lt;a href="https://github.com/citusdata/pg_shard"&gt;pg_shard’s GitHub
page&lt;/a&gt; for detailed instructions.&lt;/p&gt;

&lt;p&gt;Whether you want a distributed document store alongside your normal
PostgreSQL tables or need the extra computational power afforded by a
sharded cluster, pg_shard can help. We continue to grow pg_shard’s
capabilities and are open to feature requests.&lt;/p&gt;

&lt;h3&gt;Got questions?&lt;/h3&gt;

&lt;p&gt;If you have any questions about pg_shard, please contact us using the
&lt;a href="https://groups.google.com/group/pg_shard-users"&gt;pg_shard-users mailing
list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you discover an issue when using pg_shard, please submit it to our
&lt;a href="https://github.com/citusdata/pg_shard/issues"&gt;issue tracker on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Further information is available &lt;a href="/citus-products/pg-shard"&gt;on our
website&lt;/a&gt;, where you are free to &lt;a href="/about-us/contact-citus-data"&gt;contact
us&lt;/a&gt; with any general questions you may
have.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href='https://www.citusdata.com/blog/2015/07/30/announcing-pgshard-12/'&gt;citusdata.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <title>PGConf Silicon Valley PostgreSQL Conference Call for Speaker Proposals Closes June 15th</title>
    <link rel="alternate" href="https://www.citusdata.com/blog/2015/05/29/pgconf-sv-call-for-proposals/"/>
    <id>https://www.citusdata.com/blog/2015/05/29/pgconf-sv-call-for-proposals/</id>
    <published>2015-05-29T00:00:00+00:00</published>
    <updated>2015-05-29T00:00:00+00:00</updated>
    <author>Jason Petersen</author>
    <content type="html">&lt;p&gt;&lt;a href="http://www.pgconfsv.com/" title="PGConf Silicon Valley"&gt;&lt;strong&gt;PGConf Silicon
Valley&lt;/strong&gt;&lt;/a&gt;
is a technical conference for the PostgreSQL community scheduled for
November 17th and 18th 2015 at the &lt;a href="http://www.ssfconf.com/"&gt;&lt;strong&gt;South San Francisco Conference
Center&lt;/strong&gt;&lt;/a&gt;. The call
for &lt;a href="http://www.pgconfsv.com/#!submit/cy0a" title="Submit a Speaking Proposal"&gt;&lt;strong&gt;speaking
proposals&lt;/strong&gt;&lt;/a&gt;
for PGConf Silicon Valley is open until June 15, 2015.&lt;/p&gt;

&lt;p&gt;We are pleased to welcome our distinguished Conference Committee which
includes PostgreSQL community leaders from throughout the United States:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Josh Berkus, PostgreSQL Experts&lt;/li&gt;
&lt;li&gt;  Peter van Hardenberg, Heroku&lt;/li&gt;
&lt;li&gt;  Albert Strasheim, Cloudflare&lt;/li&gt;
&lt;li&gt;  Simon Riggs, 2nd Quadrant&lt;/li&gt;
&lt;li&gt;  Elein Mustain, Adobe&lt;/li&gt;
&lt;li&gt;  Bruce Momjian, EnterpriseDB&lt;/li&gt;
&lt;li&gt;  Patrick King, Pandora Internet Radio&lt;/li&gt;
&lt;li&gt;  Ozgun Erdogan, Citus Data&lt;/li&gt;
&lt;li&gt;  Rafael Solari, Neustar&lt;/li&gt;
&lt;li&gt;  Aaron Brashears, Twitch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The conference will bring together the PostgreSQL community for an event
filled with learning, connection building, and speakers from leading
Postgres users and vendors on topics such as performance tuning, backup
and replication, and new technologies.&lt;/p&gt;

&lt;p&gt;PGConf Silicon Valley is the first solely PostgreSQL-focused conference
in Silicon Valley. &lt;a href="http://www.pgconfsv.com/#!sponsors/c120m" title="Sponsors"&gt;&lt;strong&gt;Sponsorship
opportunities&lt;/strong&gt;&lt;/a&gt;
are available. Visit the PGConf Silicon Valley website to register or
get more information about the conference.&lt;/p&gt;

&lt;h5&gt;What&lt;/h5&gt;

&lt;p&gt;PGConf Silicon Valley&lt;/p&gt;

&lt;h5&gt;Where&lt;/h5&gt;

&lt;p&gt;November 17–18, 2015&lt;/p&gt;

&lt;h5&gt;When&lt;/h5&gt;

&lt;p&gt;South San Francisco Conference Center&lt;br&gt;
255 S Airport Blvd&lt;br&gt;
South San Francisco, CA 94080&lt;br&gt;
&lt;a href="https://www.google.com/maps/place/South+San+Francisco+Conference+Center" title="map to conference center"&gt;&lt;strong&gt;(map)&lt;/strong&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;Call for Speakers Deadline&lt;/h5&gt;

&lt;p&gt;June 15, 2015&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href='https://www.citusdata.com/blog/2015/05/29/pgconf-sv-call-for-proposals/'&gt;citusdata.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <title>Webinar on PostgreSQL Real-Time Analytics Terabyte-Scale Data Ingestion on May 20th</title>
    <link rel="alternate" href="https://www.citusdata.com/blog/2015/05/14/webinar-terabyte-scale-data-ingestion/"/>
    <id>https://www.citusdata.com/blog/2015/05/14/webinar-terabyte-scale-data-ingestion/</id>
    <published>2015-05-14T00:00:00+00:00</published>
    <updated>2015-05-14T00:00:00+00:00</updated>
    <author>Jason Petersen</author>
    <content type="html">&lt;p&gt;Learn how to enable real-time analytics on terabytes of data using
PostgreSQL by combining analytics and operational workloads in a single
database. Join Utku Azman from Citus Data at 10:00 AM PDT on Wednesday,
May 20, 2015 for a webinar which will discuss the challenges of
real-time big PostgreSQL and present a case study of a company which has
solved this challenge.&lt;/p&gt;

&lt;p&gt;In this live webinar, you will hear how users can differentiate
themselves in a competitive environment by leveraging the rich and
flexible PostgreSQL ecosystem. You will become familiar with the
concepts and methods employed to enable processing of billions of events
in real-time using PostgreSQL-based solutions. You will also hear about
real-life implementations including
how &lt;a href="https://www.cloudflare.com/"&gt;&lt;strong&gt;Cloudflare&lt;/strong&gt;&lt;/a&gt; uses &lt;a href="/product"&gt;&lt;strong&gt;Citus&lt;/strong&gt;&lt;/a&gt; to
power real-time analytics for millions of websites on tens of terabytes
of data.&lt;/p&gt;

&lt;p&gt;By the end of the webinar, attendees will have a greater understanding
of Citus features that enable massive parallelization of analytical
queries, real-time ingestion on distributed data sets, columnar storage
and advanced data compression, and high availability and dynamic
scaling. They will also have a better understanding of a real-time
analytics use case on big PostgreSQL.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href='https://www.citusdata.com/blog/2015/05/14/webinar-terabyte-scale-data-ingestion/'&gt;citusdata.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <title>Webinar on Sharding and Scaling PostgreSQL on April 29th</title>
    <link rel="alternate" href="https://www.citusdata.com/blog/2015/04/22/webinar-scaling-postgresql/"/>
    <id>https://www.citusdata.com/blog/2015/04/22/webinar-scaling-postgresql/</id>
    <published>2015-04-22T00:00:00+00:00</published>
    <updated>2015-04-22T00:00:00+00:00</updated>
    <author>Jason Petersen</author>
    <content type="html">&lt;p&gt;I am hosting a webinar on April 29th at 10:00 am Pacific time to discuss
the core principles of PostgreSQL database scaling. The webinar will
enable attendees to avoid common mistakes that limit future system
capabilities.&lt;/p&gt;

&lt;p&gt;During the webinar, I will discuss how to simplify PostgreSQL scaling by
using &lt;a href="/citus-products/pg-shard"&gt;&lt;strong&gt;pg_shard&lt;/strong&gt;&lt;/a&gt;, an open-source
extension for managing the placement, distribution, and replication of a
sharded database. pg_shard is a standalone PostgreSQL extension
developed by Citus Data which addresses many NoSQL use cases while also
enabling targeted real-time analytical queries. For users who need
complex full-cluster queries and JOINs, pg_shard provides an easy
upgrade path to &lt;a href="/citus-products/citusdb-software"&gt;&lt;strong&gt;CitusDB&lt;/strong&gt;&lt;/a&gt;. The
pg_shard extension maintains full compatibility with the standard
PostgreSQL tool set, types, indexes, and semi-structured data types.&lt;/p&gt;

&lt;p&gt;By the end of the webinar, attendees will understand the core tenets of
database scaling as well as which scaling problems are appropriate to
address with sharding. In addition, I will cover the features and
functionality of pg_shard by presenting use cases where it is being
successfully used in production environments today.&lt;/p&gt;

&lt;p&gt;I hope you will join me on April 29th.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href='https://www.citusdata.com/blog/2015/04/22/webinar-scaling-postgresql/'&gt;citusdata.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <title>Announcing pg_shard 1.1</title>
    <link rel="alternate" href="https://www.citusdata.com/blog/2015/03/18/pg-shard-11/"/>
    <id>https://www.citusdata.com/blog/2015/03/18/pg-shard-11/</id>
    <published>2015-03-18T00:00:00+00:00</published>
    <updated>2015-03-18T00:00:00+00:00</updated>
    <author>Jason Petersen</author>
    <content type="html">&lt;p&gt;Last winter, we open-sourced &lt;a href="/citus-products/pg-shard"&gt;&lt;strong&gt;pg_shard&lt;/strong&gt;&lt;/a&gt;,
a transparent sharding extension for PostgreSQL. It brought
straightforward sharding capabilities to PostgreSQL, allowing tables and
queries to be distributed across any number of servers.&lt;/p&gt;

&lt;p&gt;Today we’re excited to announce the next release of pg_shard. The
changes in this release include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Improved performance&lt;/strong&gt; — &lt;strong&gt;INSERT&lt;/strong&gt; commands run up to four times
faster&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Shard repair&lt;/strong&gt; — Easily bring inactive placements back up to speed&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Copy script&lt;/strong&gt; — Quickly import data from CSV and other files from
the command line&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CitusDB integration&lt;/strong&gt; — Expose pg_shard’s metadata for CitusDB’s
use&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Resource improvements&lt;/strong&gt; — Execute larger queries than ever before&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more information about recent changes, you can view &lt;a href="https://github.com/citusdata/pg_shard/issues?q=is:closed+sort:updated-desc+-label:invalid++milestone:v1.1"&gt;&lt;strong&gt;all the
issues closed during this release
cycle&lt;/strong&gt;&lt;/a&gt; on
GitHub.&lt;/p&gt;

&lt;p&gt;Upgrading or installing is a breeze: see &lt;a href="https://github.com/citusdata/pg_shard"&gt;&lt;strong&gt;pg_shard’s GitHub
page&lt;/strong&gt;&lt;/a&gt; for detailed
instructions.&lt;/p&gt;

&lt;p&gt;Whether you want a distributed document store alongside your normal
PostgreSQL tables or need the extra computational power afforded by a
sharded cluster, pg_shard can help. We continue to grow pg_shard’s
capabilities and are open to feature requests.&lt;/p&gt;

&lt;h3&gt;Got questions?&lt;/h3&gt;

&lt;p&gt;If you have any questions about pg_shard, please contact us using
the &lt;a href="https://groups.google.com/group/pg_shard-users"&gt;&lt;strong&gt;pg_shard-users mailing
list&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you discover an issue when using pg_shard, please submit it to
our &lt;a href="https://github.com/citusdata/pg_shard/issues"&gt;&lt;strong&gt;issue tracker on
GitHub&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Further information is available &lt;a href="/citus-products/pg-shard"&gt;&lt;strong&gt;on our
website&lt;/strong&gt;&lt;/a&gt;, where you are free to &lt;a href="/about-us/contact-citus-data"&gt;&lt;strong&gt;contact
us&lt;/strong&gt;&lt;/a&gt; with any general questions you may
have.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href='https://www.citusdata.com/blog/2015/03/18/pg-shard-11/'&gt;citusdata.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
  </entry>
</feed>
