skip to Main Content

I was told that it’s better to have many rows of data instead of many columns in DB. I need a DB for transactional application, and intend to use PostgreSQL DB. As far as I understand I have these two options:

Option 1:

id | user_id | key     | value
-----------------------------------
1  | 1       | name    | jon
2  | 1       | surname | snow
3  | 1       | address | winterfell
4  | 2       | name    | daenerys
5  | 2       | surname | targaryen
6  | 2       | address | essos

Option 2:

id | name     | surname   | address
--------------------------------------
1  | jon      | snow      | winterfell
2  | daenerys | targaryen | essos

Since option 1 has small number of columns the queries should be faster, as option 2 could have many columns, and to filter the data would take more time.

Can anyone advise what is the correct way for PostgreSQL DBs?

Thanks!

2

Answers


  1. The first table design is a standard unnormalized key value store. Adding an additional key/value pair requires only adding a new record, which is a relatively cheap and simple operation. Accessing certain values can be achieved via an aggregation query with pivoting:

    SELECT
        user_id,
        MAX(value) FILTER (WHERE "key" = name) AS name,
        MAX(value) FILTER (WHERE "key" = surname) AS surname
    FROM table1
    GROUP BY user_id;
    

    The second table design can achieve the same as above using a much simpler query:

    SELECT user_id, name, surname
    FROM table2;
    

    But the drawback here is that the table structure is rigid and fixed. If you do need to frequently add or remove user attributes, it requires an intrusive ALTER statement to the entire table, to add or remove columns.

    The option you choose depends on your data model. For a known/fixed set of user attributes, the second options looks better. If user attributes are unknown and you require more flexibility, the first option might be a better choice.

    Login or Signup to reply.
  2. If you know roughly what the attributes would be, then you should go with option 2. It’ll allow you to define better indexes on your table, thus improving read performance. With option 1, it’s likely you’ll have a small number of distinct values in the key column, so even index scans will be slow.

    The only reason I’d use option 1 is if I either have no idea what the attributes will be and/or the number of attributes is huge (e.g. say 100 attributes). Even then, I’d prefer to model this data as a one-to-many relationship using multiple tables.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search