I’m working with a few large row count tables with a commonly named column (table_id). I now intend to add an index to this column in each table. One or two of these tables use up 10x more space than the others, but let’s say for simplicity that all tables have the same row count. Basically, the 10x added space is because some tables have many more columns than others. I have two inter-related questions:
-
I’m curious about whether overall indexing time is a function of the table’s disk usage or just the row count?
-
Additionally, would duplicate values in a column speed up the indexing time at all, or would it actually slow down indexing?
If indexing time is only dependent on row count, then all my tables should get indexed at the same speed. I’m sure someone could do benchmark tests to answer all these questions, but my disks are currently tied up indexing those tables.
2
Answers
The speed of
CREATE INDEX
depends on several factors:the kind of index
the number of rows
the speed of your disks
the setting of
maintenance_work_mem
andmax_parallel_maintenance_workers
The effort to sort the data grows with
O(n * log(n))
, wheren
is the number of rows. Reading and writing should grow linearly with the number of rows.I am not certain, but I’d say that duplicate rows should slow down indexing a little bit from v13 on, where B-tree index deduplication was introduced.
The speed of indexing depends on the following factors: