I am working with a PostgreSQL database and need to encrypt sensitive data fields (like SSN and email) for security purposes. However, I want to be able to query these encrypted fields efficiently, especially using WHERE clauses, without decrypting each row individually during the query.
I am using pgcrypto to encrypt sensitive data, and I’ve set up a trigger to automatically encrypt the data before inserting it into the database.
The problem arises when I try to query encrypted fields using a WHERE clause. Since the data is encrypted, PostgreSQL cannot directly filter on these fields without decrypting every row, which is inefficient. I’m looking for a way to make WHERE queries feasible on encrypted data.
Is there a recommended approach for handling WHERE clauses on encrypted data in PostgreSQL? I am open to using other methods or functions if they make querying more efficient.
I understand that using a function to decrypt data as needed might result in overhead, as pointed out. I’m trying to avoid performance bottlenecks. I’d appreciate any guidance on minimizing this overhead or achieving an efficient way to query encrypted data.
2
Answers
The short answer: if you store encrypted data in the database, you cannot use an index to search for them efficiently. Security limits performance and ease of use, and in this case, the performance hit is the price you have to pay for security.
The exception to the above is if you use a deterministic encryption method (which is not ideal for security) and search with
=
: then you can use an index to search for the encrypted data.Note that using pgcrypto to encrypt your data is not very secure: you have to send the clear text and the encryption key to the database, so you cannot protect the data from the database administrator. It is better to encrypt the data inside the application.
Actually there are few ways that are still considered secure to search on an encrypted data in postgres:
Blind Indexing for Exact Match Searches:
To enable exact match searches (
WHERE x = y
) without decrypting every row, you can use a blind index. This involves hashing your sensitive data and storing the hash alongside the encrypted field as another column. When you need to search, you hash the input and compare it to the stored hashes.Instead of complicating things with SQL functions, you can handle the hashing in your application code.
This keeps your SQL straightforward. Make sure you are using a secure hashing algorithm and that you add a salt to the hashed data to keep things secure.
Bloom Filters for Substring Searches:
If you need to perform substring or pattern matching (
WHERE x LIKE y
) on encrypted data, there is a more complex bloom filter index. Bloom filter is basically how normal substring search works but normally bloom filter is computed for trigrams (groups of 3 characters) of a text value. The nuance for making it secure for sensitive data is that every trigram need to be hashed before adding it to the bloom filter. Also note that bloom filter relay on some probabilistic assumptions so you need to consider a lot of variables in mind like what is the length of the average record data and what is anthropy of the data to choose what should be the hash length to minimize collisions. Also bloom filter helps you filter the data to candidates but these candidates might still have false positives so you must verify these candidates eventually with additional query to the database and decrypting the data (which is more efficient than decrypting the entire database).Keep in mind:
Use a Third-Party Solution:
Alternatively, you could use a tool like Piiano Vault that provides built-in support for querying encrypted data, including substring searches using the techniques mentioned above and more protective measures.
Advantages:
Considerations:
Full disclosure: I’m a software engineer at Piiano.
In my experience, if you just need exact match searches, blind indexing with hashing in your application code is a simple and effective solution but make sure you use a secure hashing algorithm and that you add a salt to every hash.
For more complex searching needs, especially if you want to save time and avoid complexity, a tool like Piiano Vault or similar alternative can be a real lifesaver.