I’m an SQL novice, I have a big data table with attributes exceeding 10000 columns which is hosted by CSV, and those columns come from multiple sites, I tried to import them into a database to manage them due to most of them are insignificant but sometimes I still need them and update or add new data onwards.
Now the problem is the database table columns limit can’t hold so many fields. I tried to find a way to separately storage those data in several tables of a database and use a single interface to select all of the associated data.
like
table1 with two columns, column1, column2,
and table2 with two columns, column3,column4,
those two tables connect through column1 and column3, I want to select column1 and get column2,4 data together.
2
Answers
Create a table with the important and frequent columns as normal columns and add a
jsonb
column. Convert all the rare and unimportant columns into a JSON and store it in thejsonb
column. If you need to search for attributes of that JSON, create a GIN index on the column.There are a few limitations to Postgres. If you are using this to do indexing at scale, you might consider something like pg_bm25, or otherwise you might want to instead use column-oriented Postgres if your data has much fewer rows than columns in its current format?