Carl Steinbach



Analyzing PostgreSQL Email Archives with PostgreSQL

Written by By Carl Steinbach | April 20, 2017 Apr 20, 2017

Update: This post was originally published in January 2013. We updated the post in 2017 to include the appropriate SQL commands you can use with Citus. The Citus extension to Postgres is open source and available to try via download or in the cloud via the Azure Cosmos DB for PostgreSQL managed service.

PostgreSQL’s Full Text Search capability is an excellent example of a powerful, relatively new feature that Citus users are able to leverage. First introduced in PostgreSQL 8.3 and continuously improved since then, Full Text Search (FTS) provides the SQL semantics necessary to run keyword searches over a corpus of documents stored in a database. FTS executes these operations efficiently by enabling the use of GIN and GiST indexes . In addition, from our standpoint, one of the best features of FTS is that Citus makes it linearly scalable. Specifically, with Citus one can double the size of the search dataset and still satisfy the same SLAs simply by doubling the number of machines in the Citus cluster.

In this blog post we wanted to demonstrate using Full Text Search with Citus, but in order to do so we first needed to find an interesting dataset to use in our examples. After hunting around we eventually hit on the idea of using email archives from the PostgreSQL mailing lists. Besides having a nicely self-reflexive quality about it, we also thought this would provide a good opportunity to learn more about the PostgreSQL community. Early in the history of PostgreSQL project, these lists became the primary communication mechanism for both developers and users, and the collected archives in turn provide a unique opportunity to get a complete view of almost everything that has happened in the project over the past fifteen years.

Keep reading

Running PostgreSQL on Compression-enabled ZFS

Written by By Carl Steinbach | April 30, 2013 Apr 30, 2013

“Does PostgreSQL support compression?” is a common question we get from our customers, and it’s easy to see why they’re asking. Many of them generate and collect large volumes of log and event stream data and store it in text formats such as JSON or...

Keep reading

Page 1 of 1