In Postgres, is overall indexing time for a column dependent on row count or on table disk space usage? - Postgresql

Se225nHealy
February 16, 2023
151 views
1 vote
2 Answers

I’m working with a few large row count tables with a commonly named column (table_id). I now intend to add an index to this column in each table. One or two of these tables use up 10x more space than the others, but let’s say for simplicity that all tables have the same row count. Basically, the 10x added space is because some tables have many more columns than others. I have two inter-related questions:

I’m curious about whether overall indexing time is a function of the table’s disk usage or just the row count?
Additionally, would duplicate values in a column speed up the indexing time at all, or would it actually slow down indexing?

If indexing time is only dependent on row count, then all my tables should get indexed at the same speed. I’m sure someone could do benchmark tests to answer all these questions, but my disks are currently tied up indexing those tables.

Answers

- LaurenzAlbe
- February 16, 2023 at 4:37 pm
- 0 votes
0
The speed of CREATE INDEX depends on several factors:
- the kind of index
- the number of rows
- the speed of your disks
- the setting of maintenance_work_mem and max_parallel_maintenance_workers
The effort to sort the data grows with O(n * log(n)), where n is the number of rows. Reading and writing should grow linearly with the number of rows.

I am not certain, but I’d say that duplicate rows should slow down indexing a little bit from v13 on, where B-tree index deduplication was introduced.
Login or Signup to reply.

- RaminFaracov
- February 16, 2023 at 4:52 pm
- 0 votes
0
The speed of indexing depends on the following factors:
1. Count of rows, it is one of the factors that most affect the speed of the index.
2. The type of column (int, text, bigint, json) is also one of the factors that influence indexing.
3. Duplicate data affects index size, not index speed. It may have a very slight effect on the speed of the index. It mainly affects the size. So if a column has a lot of duplicate data, the size of the column index decreases.
4. The speed of the index can be affected by the disk in this way, for example: When you created index choosing is in different tablespace, and this tablespace is setting to another HDD using configuration. And at this time, if the disk on which the index is created is an SSD, and the other is a regular HDD disk, then of course the index creation speed will increase.
5. Also, PostgreSQL server configurations have memory usage and other parameters that can affect index creation speed, so if some parameters like buffer memory are high, then indexing speed will increase.
Login or Signup to reply.