skip to Main Content

Lets say I have a table T1. Initially it has ID, Identifier, C1, C2 as columns. As time goes by I will be adding new columns C3, C4 and so on. Will it be better to have the columns as ID, Identifier, Info with Info being a JSON column with C1, C2, C3 as keys and their corresponding values. What will be the read, write, update performance considering a scale of around 10^9 records.

2

Answers


  1. If you are going to use a relational database then you should use a relational model. While many RDBMSs can now handle JSON that doesn’t mean you should hold your data as JSON (by default).

    A good example of when to use JSON is if your source data is supplied as JSON. You can load it as-is into your RDBMS and then use SQL to "unpack" it from the staging table JSON into a proper relational format.

    Regarding performance, there are so many variables involved it is impossible to provide a useful answer. The only way for you to get a useful answer is for you to test it on your environment with your data and with queries relevant to you

    Login or Signup to reply.
    • If you need to include the ‘columns’ in WHERE or ORDER BY clauses, make them actual columns.
    • If you have a bunch of miscellany, especially if the values are often optional, then use JSON.
    • Adding a second table (vertically partitioning) when you add more columns; then LEFT JOINing to get C3 (etc) — There are pros and cons to this ‘solution’.
    • Entity-Attribute-Value schemas are notoriously problematic.
    • MySQL 8.0 can ALTER TABLE .. ADD COLUMN .. "instantly". But there are caveats.

    If you try to do Where/Order on JSON fields for a billion rows, your computer will melt down into a slag heap.

    If you want more advice on C1/C2/C3, please explain what your application is, how often columns will be added, how likely the cells will be NULL (missing, omitted, etc). We can discuss the alternatives.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search