I was told that it’s better to have many rows of data instead of many columns in DB. I need a DB for transactional application, and intend to use PostgreSQL DB. As far as I understand I have these two options:
Option 1:
id | user_id | key | value
-----------------------------------
1 | 1 | name | jon
2 | 1 | surname | snow
3 | 1 | address | winterfell
4 | 2 | name | daenerys
5 | 2 | surname | targaryen
6 | 2 | address | essos
Option 2:
id | name | surname | address
--------------------------------------
1 | jon | snow | winterfell
2 | daenerys | targaryen | essos
Since option 1 has small number of columns the queries should be faster, as option 2 could have many columns, and to filter the data would take more time.
Can anyone advise what is the correct way for PostgreSQL DBs?
Thanks!
2
Answers
The first table design is a standard unnormalized key value store. Adding an additional key/value pair requires only adding a new record, which is a relatively cheap and simple operation. Accessing certain values can be achieved via an aggregation query with pivoting:
The second table design can achieve the same as above using a much simpler query:
But the drawback here is that the table structure is rigid and fixed. If you do need to frequently add or remove user attributes, it requires an intrusive
ALTER
statement to the entire table, to add or remove columns.The option you choose depends on your data model. For a known/fixed set of user attributes, the second options looks better. If user attributes are unknown and you require more flexibility, the first option might be a better choice.
If you know roughly what the attributes would be, then you should go with option 2. It’ll allow you to define better indexes on your table, thus improving read performance. With option 1, it’s likely you’ll have a small number of distinct values in the key column, so even index scans will be slow.
The only reason I’d use option 1 is if I either have no idea what the attributes will be and/or the number of attributes is huge (e.g. say 100 attributes). Even then, I’d prefer to model this data as a one-to-many relationship using multiple tables.