skip to Main Content

I’m an SQL novice, I have a big data table with attributes exceeding 10000 columns which is hosted by CSV, and those columns come from multiple sites, I tried to import them into a database to manage them due to most of them are insignificant but sometimes I still need them and update or add new data onwards.

Now the problem is the database table columns limit can’t hold so many fields. I tried to find a way to separately storage those data in several tables of a database and use a single interface to select all of the associated data.
like
table1 with two columns, column1, column2,
and table2 with two columns, column3,column4,
those two tables connect through column1 and column3, I want to select column1 and get column2,4 data together.

csv data

2

Answers


  1. Create a table with the important and frequent columns as normal columns and add a jsonb column. Convert all the rare and unimportant columns into a JSON and store it in the jsonb column. If you need to search for attributes of that JSON, create a GIN index on the column.

    Login or Signup to reply.
  2. There are a few limitations to Postgres. If you are using this to do indexing at scale, you might consider something like pg_bm25, or otherwise you might want to instead use column-oriented Postgres if your data has much fewer rows than columns in its current format?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search