Paul Ramsey of CartoDB Talks About PostGIS and PGConf Silicon Valley

CartoDB’s Paul Ramsey spoke to us recently about the PGConf Silicon Valley PostgreSQL conference which is November 17-18 at the South San Francisco Conference Center.

Paul’s talk at next month’s PGConf Silicon Valley PostgreSQL conference is titled “PostGIS Gotchas: the things you didn’t know that lead to confusion and dismay.” He’ll be covering the things that make PostGIS different from other database extensions – and will be pointing out the “gotchas” that arise from those things… “so you can use PostGIS without being Got.”

PostGIS is an open source program that adds support for spatial types to the PostgreSQL database – enabling storage and queries around location and mapping to be run in SQL.

Unfortunately, spatial types aren’t quite like other types: they’re bigger, they require a lot of complicated algorithms to reason with and they are built against an implicit model of the world.

This is generally understood by GIS people “but can be a foreign world to database people,” explains Paul, a solutions architect at CartoDB. “This can lead to Gotchas: things that we think should, in a rational world, work fine, but for some reason do not.”

A bit more about Paul: He’s an open source and enterprise IT expert with specialization in geospatial technology. He is passionate about building large geospatial systems, hacking on PostGIS, working with open source communities – and leading technology teams to build amazing things. I spoke with him about his talk, PostGIS and more.

Terry: Paul, In addition to basic location awareness, what other features does PostGIS offer DBAs that they might not be aware of? In other words, why should they take note and learn more about it?

Paul: The most under-appreciated aspect of spatial types in general is the ability of location to serve as a “universal key”. You can join otherwise disparate data together using the location as a key, or as a loose filter. For example, multiple customer lists might be hard to match using names and address strings along (because of issues with standardization or duplicate names, or small errors) but the same data can be put together quite handily when the location is added as a filter. The other big feature is the variety of spatial types PostGIS has. While it’s true that a lot of location logic is based around point data, there is also lots of data out there modeled as polygons, and lines, and having a type that can support those models transparently is very useful.

Terry: Could you list a few companies that are using PostGIS, along with how they are using it?

Paul: Zonar Systems in Seattle is a fleet management company that has been using PostGIS from very early on, as the repository for big streams of GPS data from school bus and delivery truck fleets. We tend to think about that kind of data as a “moving map”, in real time, but they actually generate the most value for customers with retrospective analyses: what’s the total fleet mileage in a month per county; what routes did this vehicle travel; based on mileage, warn when maintenance cycles should happen, that kind of thing.

RedFin is a real estate online service, which originally started out as a MySQL system but switched over to a PostgreSQL/PostGIS system, primarily because they could get better spatial searching on the PgSQL system. The main advantage was the PgSQL query planner and the fact that PostGIS integrates tightly with it, providing good query plans that account for spatial selectivity as well as a non-spatial selectivity.

PlanetLabs is a really interesting “agile aerospace” company launching 100s of micro-sats for earth observation. They use PostGIS to maintain their catalog of imagery, so they can serve up an imagery access API, and to coordinate the huge amount of processing they do to convert raw scenes into rectified imagery for end users.

Terry: How you are working with PostGIS at CartoDB?

Paul: The engine underneath the CartoDB mapping editor is actually PostgreSQL/PostGIS. This gives us a lot of flexibility in building fancy applications, since we have the full power of PostGIS and PostgreSQL for spatial analytics, and we’re rolling out more spatial BI tools over the coming year.

Any given map layer in CartoDB is actually expressed as just a pairing of a SQL statement (to generate the data to map) and a style document (to apply colors and thematic rendering to the raw data). Using the API, you can actually change the underlying SQL for a map layer on the fly, so web mapping applications with CartoDB can be very dynamic and user-driven.

I personally have been spending a lot of time over the last few months on little efficiency enhancements to make the rendering process faster and less network intensive. Spatial data tends to be a lot larger than other data, so reducing size can have really big effects on performance.

Terry: In looking through the agenda, what other talks are you interested in attending at the PGConf Silicon Valley conference?

Paul: As a rapidly growing company, a lot of our pain points are around scale, both in terms of of number of users and also data sizes. So, I’ll be going to talks about users who run at scale, like TripAdvisor, Amazon RDS, and Heroku, and about new technologies for handling large analysis workloads, like PipelineDB and pg_shard.

I’m really looking forward to PgConfSV!

Terry: Thank you for speaking with us, Paul.

Registration is open for PGConf Silicon Valley now. Use discount code CitusData20 for a 20% savings off the current prices. We hope to see you at PGConf SV!