Citus 10 is out! New features include columnar storage & Citus on a single node—plus we’ve open-sourced the shard rebalancer. Read the Citus 10 blog.
Data security and data privacy are important, no one disputes that. We all want to keep private things private and to keep our data secure. And yet, data needs to be shared, to enable insights, to help organizations observe patterns and have those “ah-ha” moments. None of us want the extreme where, in an effort to keep data secure, there is no access to data of any form within your organization, and the result is no business insights or analytics. With GDPR going into effect, you’ve likely been rethinking what security controls you have in place.
Here at Citus Data we collaborate with SaaS businesses and larger enterprises alike, generally to consult on Postgres data models and how to best scale out their database. (Our Citus extension to Postgres enables you to scale out Postgres horizontally. The benefit: performance.) In working with teams, one common thing we’ve seen companies do is to restrict who can see which bits of Personally Identifiable Information (PII) within your database. There are a number of approaches, including heavyweight ETL processes that mask PII bits. An ETL process tends to introduce a certain amount of latency from the time data is in your system until the time it can be analyzed.
Fortunately, Postgres provides a few primitives that can be used directly within your database to hide PII, while still enabling sophisticated analytics and exploration of data in real time.
Here we’ll look at using Postgres schemas and views to provide access to data while keeping PII safe and hidden.
Let’s first create an initial table that has a mix of PII and also non-PII data that we’ll want to provide access to.
CREATE TABLE users ( email text, firstname text, lastname text, created_at timestamptz, updated_at timestamptz );
Our users table in this example is pretty basic, but even still we have a mix of data we want to make queryable by others in our organization as well as some data that we want to mask. We’re going to hide email, firstname, and last name while creating a new view of our table that others can user for reporting and analytics.
The first step in our process is to revoke the privileges of our new user and to create a new schema. This new schema is important because we can retain the same table names which make communicating about the data a bit easier.
CREATE SCHEMA mask; CREATE USER non_admin; REVOKE ALL privileges ON SCHEMA public from non_admin;
Now we’re going to create our view that has the restricted view of the data with a
CREATE VIEW statement.
CREATE VIEW mask.users AS SELECT created_at from users;
Finally we’re going to grant
SELECT access to our newly created view to others:
GRANT USAGE ON SCHEMA mask to non_admin; GRANT SELECT ON ALL TABLES IN SCHEMA mask TO non_admin;
Now you’ve created a view of your data that restricts access, giving you great insights without exposing PII needlessly. The best part is by leveraging
search_path a query
SELECT * FROM users can run on both but return the appropriate view of the data. To explicitly compare those try these two queries as both your super user and as your non_admin user:
SELECT * FROM public.users; SELECT * FROM mask.users;