I have three tables:
A(id varchar(36) pk, c_id varchar(36)) with 1.2M rows
B(id varchar(36) pk, c_id varchar(36)) with 0.6M rows
C(id varchar(36) pk, "number" varchar(128), "status" varchar(255)) with 5.4M rows
id is UUID string,
"number" is 12 symbols long string,
"status" is one of 50 string values
And i have btree indexes on c_id on A and B, on number and status on C.
I’m executing following queries:
select a.id as id1, c.id as id2 from "A" a
inner join "C" c on a."c_id" = c."id"
where c."status" in ('seven')
order by c."number" asc
limit 10;
select b.id as id1, c.id as id2 from "B" b
inner join "C" c on b."c_id" = c."id"
where c."status" in ('seven')
order by c."number" asc
limit 10;
The first one takes roughly 70ms, the second one takes roughly 1.5s, although they should take similar amount time by their look
Explain analyze shows quite weird results:
A:
-> Index Scan using c_number_idx on C c (cost=0.43..917514.17 rows=2819568 width=50) (actual time=0.451..9.997 rows=17 loops=1)
Filter: (("status")::text = 'seven'::text)
Rows Removed by Filter: 4727
Buffers: shared hit=4679
B:
-> Index Scan using c_number_idx on C c (cost=0.43..917514.17 rows=2819568 width=50) (actual time=0.673..878.256 rows=148095 loops=1)
Filter: (("status")::text = 'seven'::text)
Rows Removed by Filter: 184231
Buffers: shared hit=333349
I don’t understand three things:
- why it uses number index for scan on filtering status
- why Index scan for table B has a lot more buffers shared hit and rows removed by filter than for table A
- why same query for table B takes a lot longer than for table A
I tried full vacuuming table B and C, i tried recreating indexes, i run queries several times, but the result is the same.
I don’t want use clustering, because i have a lot of other possible filters.
What would be the problem?
EDIT
DDL:
CREATE TABLE A(
id varchar(36) primary key,
c_id varchar(36)
);
create index on A(c_id);
CREATE TABLE B(
id varchar(36) primary key,
c_id varchar(36)
);
create index on B(c_id);
CREATE TABLE C(
id varchar(36) primary key,
"number" varchar(128),
"status" varchar(255)
);
create index on C("number");
create index on C("status");
2
Answers
First of all using UUID as PK is conter performant because it uses a litteral type in PG (other RDBMS uses BINARY values for UUID/GUID wich is much more efficient…) so it uses 36 bytes long and have some extra overhead due to collation…
36 bytes long is 4.5 time the length of the processor word (64 bits => 8 bytes) so the effort to read only one UUID is 4.5 time longer rather than the use of a BIGINT as a PK…
UUDI are arbitray values, so the systematically bloat PK indexes… INT / BIGINT with IDENTITY or SEQUENCE does not…
Now, yout two queries does not do the same thing, despite the fact that the pattern of the query is the same. But it is not because two motorcycles have the same look, that they have the same power, the same weight and will race at the same speed !
You do not give us the complet DDL of the tables (CREATE TABLE …) including indexes… So it is difficult to answer you with precison
You only show fragments of plans, which makes it hard to interpret them definitively. But presumably it is using the index is does use so that it can get the rows already in order by "number", and stop early once it catches it LIMIT. I am guessing that the node you show is the first child of a nested loop.
The alternative (again, presumably) would be to read all 2819568 (expected) rows based on the "status" index, and then sort them. Sort cannot stop early in a meaningful way, it has to read its entire input before it can generate any output. And it is going to take a long time to read 2819568 randomly scattered rows using random IO, so that is not the preferred plan (in your case, all the buffers were already cached, but the planner doesn’t know that that will be the case, and maybe that was the case here only because you ran the same query repeatedly for testing).
In case A, this is very effective as it only needed to read 17+4727 rows before it found 10 it could keep and thus meet the LIMIT, but in case B it is much less effective because most of the early rows fail one of the other criteria and so must be discarded. In case A, it appears that 4727 were discarded because of "status", and 7 were discarded because no join partner could be found for them. In case B, 184231 were discarded due to status, and 148085 because no join partner could be found for them.
The problem of the rejected "status" is easy to solve, just make a multi-column index on
(status, "number")
. That way it can jump directly to the part of the index where status=’seven’, and still read the rows already in order by "number". Better yet, make the index(status, "number",id)
that way it can use an index-only scan.The problem of so many rows in case B failing to find a join partner is not so easy to solve. So do the first thing first, and if it still not fast enough post the new plans, in their entirety.