Hi I have a database with 10 tables. each table has ~0.5-1 billion rows, and partitioned by range and then hash (10×10=100 partitions). they are newly created tables, and import data from csv files. they are indexed on the column (id
) used for querying below. the database is hosted on a Azure PostgreSQL single server.
A test query showes most of the time is used on "I/O Timings: read":
postgres=> EXPLAIN (ANALYZE, BUFFERS) select count(*) from table_id4 where id=244730;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7458.09..7458.10 rows=1 width=8) (actual time=21141.393..21141.396 rows=1 loops=1)
Buffers: shared read=2256
I/O Timings: read=21096.814
-> Append (cost=41.26..7452.66 rows=2171 width=0) (actual time=197.168..21138.495 rows=2247 loops=1)
Buffers: shared read=2256
I/O Timings: read=21096.814
-> Bitmap Heap Scan on table_id4_r2_h5 (cost=41.26..7441.80 rows=2171 width=0) (actual time=197.167..21137.471 rows=2247 loops=1)
Recheck Cond: (id = 244730)
Heap Blocks: exact=2247
Buffers: shared read=2256
I/O Timings: read=21096.814
-> Bitmap Index Scan on table_id4_r2_h5_id_idx (cost=0.00..40.72 rows=2171 width=0) (actual time=117.586..117.586 rows=2247 loops=1)
Index Cond: (id = 244730)
Buffers: shared read=9
I/O Timings: read=116.929
Planning Time: 2.882 ms
Execution Time: 21141.449 ms
(17 rows)
I also try to ANALYZE table_name
and then set enable_bitmapscan= OFF;
according to this post. the query plan is now index ONLY, but still slow sometimes.
postgres=> EXPLAIN (ANALYZE, BUFFERS) select count(*) from table_id2 where id=179863;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5528.78..5528.79 rows=1 width=8) (actual time=24273.748..24273.749 rows=1 loops=1)
Buffers: shared read=1943
I/O Timings: read=24218.684
-> Append (cost=0.42..5523.93 rows=1940 width=0) (actual time=83.735..24271.326 rows=1959 loops=1)
Buffers: shared read=1943
I/O Timings: read=24218.684
-> Index Only Scan using table_id2_r10_h5_id_idx on table_id2_r10_h5 (cost=0.42..5514.23 rows=1940 width=0) (actual time=83.734..24270.157 rows=1959 loops=1)
Index Cond: (id = 179863)
Heap Fetches: 1959
Buffers: shared read=1943
I/O Timings: read=24218.684
Planning Time: 3.254 ms
Execution Time: 24273.787 ms
(13 rows)
[UPDATE] to answer @jjanes question on I/O speed:This database is a "replication" of an existing azure db that can query <1sec on the same tables, as ANALYZE in my last post. I notice the difference on
EXPLAIN
result is, last db only Buffers: shared hit=4479 read=21
, with have less read and more hit.This azure db has the same price-tier as the old one as: Single Server, Basic, both with "IOPS: Variable". I only upgrade the storage from 600GB to 1TB; so I think I/O speed may not an issue?
I did a batch test and show quite varied query time in the same loop:
FOR idx IN SELECT (random()*total_IDs)::int AS id from generate_series (1,10)
LOOP ...
select count(*) from table_id4 where id=idx;
...
END LOOP;
NOTICE: id: 321158 count#: 2154, time: 46.734967s
NOTICE: id: 487596 count#: 2238, time: 0.968759s
NOTICE: id: 548334 count#: 2180, time: 1.062516s
NOTICE: id: 404978 count#: 2179, time: 29.750295s
NOTICE: id: 370904 count#: 2123, time: 22.203384s
NOTICE: id: 228857 count#: 2223, time: 29.094126s
NOTICE: id: 327134 count#: 2169, time: 24.750242s
NOTICE: id: 372101 count#: 2180, time: 28.062825s
NOTICE: id: 341814 count#: 2130, time: 30.250353s
NOTICE: id: 248316 count#: 2195, time: 32.375377s
but if I repeat the query on the same ids, then the time will get to ideal ms
:
psql -c " ...
select count(*) from table_id4 where id=321158;
select count(*) from table_id4 where id=487596;
select count(*) from table_id4 where id=548334;
select count(*) from table_id4 where id=404978;
select count(*) from table_id4 where id=370904;
select count(*) from table_id4 where id=228857;
"
Time: 5267.168 ms (00:05.267)
Time: 171.925 ms
Time: 24.942 ms
Time: 11.387 ms
Time: 6.753 ms
Time: 17.573 ms
other tables behave similarly on query, and here is the table’s details,
postgres=> d+ table_id4
Unlogged table "table_id4"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+----------+-----------+----------+---------+---------+--------------+-------------
date | date | | not null | | plain | |
field1 | real | | | | plain | |
field2 | real | | | | plain | |
field3 | smallint | | | | plain | |
id | integer | | not null | | plain | |
Partition key: RANGE (id)
Indexes:
"table_id4_date_idx" btree (date)
"table_id4_id_idx" btree (id)
Partitions: table_id4_r1 FOR VALUES FROM (0) TO (1...5), PARTITIONED,
table_id4_r10 FOR VALUES FROM (...93) TO (MAXVALUE), PARTITIONED,
table_id4_r2 FOR VALUES FROM (1...) TO (3...), PARTITIONED,
table_id4_r3 FOR VALUES FROM (3...) TO (4...), PARTITIONED,
table_id4_r4 FOR VALUES FROM (4...) TO (6...), PARTITIONED,
table_id4_r5 FOR VALUES FROM (6...) TO (7...), PARTITIONED,
table_id4_r6 FOR VALUES FROM (7...) TO (9...), PARTITIONED,
table_id4_r7 FOR VALUES FROM (9...) TO (1...1), PARTITIONED,
table_id4_r8 FOR VALUES FROM (1...1) TO (...32), PARTITIONED,
table_id4_r9 FOR VALUES FROM (1...2) TO (...93), PARTITIONED
I see some similar discussion on here and here, but I am not very familiar with database and those tables are pretty large, so would like to understand more about the question before REINDEX
or VACUUM
etc, which may take days to complete.
the psql server resource monitor on the Azure portal show MAX usage (MAX usage in a given time window, not averaged over time) of [storage,memory,cpu]: 55%, 30%, <5%
. so seems resource is not an issue?
some server parameters:
CPU: vCore 2
total memory: 4GB
storage: 1Tb
shared_buffers: 512MB
work_mem: 4MB (changed to 256MB but still not work)
max_parallel_workers: 10
max_parallel_maintenance_workers: 8
does the LOGGED
and ENABLE TRIGGER ALL
will be helpful?
2
Answers
I have some update and think maybe better to post it separately.
A few tables after finishing the
VACUUM FREEZE ANALYZE
. my further test shows the slowness differ fromselect * from table_name where id=xxxx
, toselect count(*) from table_name where id=xxxx
. the latter is very fast. (beforeVACUUM
both queries are slow)I
EXPLAIN (ANALYZE, BUFFERS)
the difference in many random id queries and list the result below.the
id
used are randomly generated and different from the above two lines. the output "query_all.log" and "query_count.log" has many blocks like below:Since the log file is very long and I focus on the I/O here, I only list the
Buffers: shared
related lines in thoseANALYZE
logs:and the
count(*)
log show much shorter execution time.I run ~50 random batch test, the
select *
usually takes <1sec sometimes,or >~10sec some other times, not in between 1-10sec . theselect count(*)
takes 0.5sec usually and all <1sec.If the slow IO is caused by Azure, I assume I should see a "even" variation in the execution time? say linearly ranging from 0-20sec? but here I only see pretty fast (<1sec) and very slow (>10sec), not in between. Is this possibly still some issue with my query plan not optimized?
for the difference of
select *
andselect count(*)
, is the difference caused by less data transfer (e.g.,Buffers: shared read=... written=xxx
?) of the latter? or this post suggests that thecount(*)
skips the "heap and count entries in the index" ?You are going to need a vacuum to happen at some point anyway, so I don’t see why you should delay doing one just as a learning exercise. A reindex on the other hand might never need to happen, and will lock your table while it going on, so I wouldn’t preemptively do one of those.
I guess you could wonder why autovac isn’t already taking care of it, and maybe a manual vacuum would destroy evidence around that question? Or since a manual vacuum is (by default) unthrottled, so doing one could lead to performance problems for other queries by using up the available resources.
For this case, I would expect it to be doing an index-only scan, and the only reason I can think of why it would not is because your pg_class.relallvisible for the table is very low relative to relpages (which in turn probably indicates that vacuum is not doing what it should). There are two ways to assess this other than running a preemptive vacuum. You could directly query pg_class and see what it says for relallvisible and relpages for the relevant tables, or you could force it do an index-only scan and see see what it reports for heap fetches in the
explain (analyze, buffers)
when it does one. For the later, doingset enable_bitmapscan=off
will probably suffice to get it to do the index-only scan.I see you added the bitmap off plan while I was typing, and indeed the heap fetches is as high as the rows returned, so the visibility map is not tuned up, so a vacuum is what you need to make that happen. Since you are using an old version (autovacuum_vacuum_insert_scale_factor was introduced in v13) and these are fresh tables (inserted/copied but never updated or deleted) a manual vacuum is the only way to get them vacuumed.
I’m not familiar with this Azure portal. Is "storage" referring to capacity of storage (before it runs out of space), or performance of storage? If performance, does it refer to byte throughput performance (where 100% can only be obtained by sequential reads) or IOPS performance? If the latter, then maybe you would benefit by increasing effective_io_concurrency.