Postgresql - Select records that do not have at least one child element

egeo
November 12, 2022
221 views
3 votes
4 Answers

How can I make an SQL query to select records that do not have at least one child element?

I have 3 tables: article (~40K rows), calendar (~450K rows) and calendar_cost (~500K rows).

It is necessary to select such entries of the article table:

there are no entries in the calendar table,
if there are entries in the calendar table, then all of them should not have any entries in the calendar_cost table.

create table article (
    id int PRIMARY KEY,
    name varchar
);

create table calendar (
    id int PRIMARY KEY,
    article_id int REFERENCES article (id) ON DELETE CASCADE,
    number varchar
);

create table calendar_cost (
    id int PRIMARY KEY,
    calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
    cost_value numeric
);

insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');

insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');

insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);

As a result, "Article 2" (condition 2) and "Article 3" (condition 1) will suit us.

My SQL query is very slow (the second condition part), how can I do it optimally? Is it possible to do without "union all" operator?

-- First condition
select a.id from article a 
left join calendar c on a.id = c.article_id
where c.id is null

union all 

-- Second condition
select a.id from article a
where id not in(
    select aa.id from article aa
    join calendar c on aa.id = c.article_id
    join calendar_cost cost on c.id = cost.calendar_id
    where aa.id = a.id limit 1
)

UPDATE

This is how you can fill my tables with random data for about the same amount of data. The @Bohemian query is very fast, and the rest are very slow. But as soon as I applied 2 indexes, as @nik advised, all queries began to be executed very, very quickly!

do $$
declare
    article_id int;     
    calendar_id bigint;
    i int; j int;
begin
    create table article (
        id int PRIMARY KEY,
        name varchar
    );
    
    create table calendar (
        id serial PRIMARY KEY,
        article_id int REFERENCES article (id) ON DELETE CASCADE,
        number varchar      
    );
    create INDEX ON calendar(article_id);

    create table calendar_cost (
        id serial PRIMARY KEY,
        calendar_id bigint REFERENCES calendar (id) ON DELETE CASCADE,
        cost_value numeric      
    );
    create INDEX ON calendar_cost(calendar_id);

    for article_id in 1..45000 loop
        insert into article (id, name) values (article_id, 'Article ' || article_id);
        for i in 0..floor(random() * 25) loop
            insert into calendar (article_id, number) values (article_id, 'Number ' || article_id || '-' || i) returning id into calendar_id;
            for j in 0..floor(random() * 2) loop
                insert into calendar_cost (calendar_id, cost_value) values (calendar_id, round((random() * 100)::numeric, 3));  
            end loop;
        end loop;
    end loop;
end $$;

@Bohemian
Planning Time: 0.405 ms
Execution Time: 1196.082 ms

@nbk
Planning Time: 0.702 ms
Execution Time: 165.129 ms

@Chris Maurer
Planning Time: 0.803 ms
Execution Time: 800.000 ms

@Stu
Planning Time: 0.446 ms
Execution Time: 280.842 ms

So which query to choose now as the right one is a matter of taste.

Answers

- Bohemian
- November 12, 2022 at 9:39 pm
- 0 votes
0
No need to split the conditions: The only condition you need to check for is that there are no calendar_cost rows whatsoever, which is the case if there are no calendar rows.

The trick is to use outer joins, which still return the parent table but have all null values when there is no join. Further, count() does not count null values, so requiring that the count of calendar_cost is zero is all you need.
```
select a.id
from article a
left join calendar c on c.article_id = a.id
left join calendar_cost cost on cost.calendar_id = c.id
group by a.id
having count(cost.calendar_id) = 0
```
See live demo.

If there are indexes on the id columns (the usual case), this query will perform quite well given the small table sizes.
Login or Signup to reply.

- ChrisMaurer
- November 12, 2022 at 9:41 pm
- 0 votes
0
Your second condition should start just like your first one: find all the calendar entries without calendar cost and only afterwards join it to article.
```
select a.id
from article a
  Inner Join (
    Select article_id
    From calendar c left join calendar_cost cc
      On c.id=cc.calendar_id
    Where cc.calendar_id is null
) cnone
  On a.id = cnone.article_id
```
This approach is based on the thought that calendar entries without calendar_cost is relatively rare compared to all the calendar entries.
Login or Signup to reply.

Your query is not valid as IN clauses don’t support LIMIT

Adding some indexes on article_id and calender_id

Will help the performance

As you can see in the query plan

create table article (
    id int PRIMARY KEY,
    name varchar(100)
);

create table calendar (
    id int PRIMARY KEY,
    article_id int REFERENCES article (id) ON DELETE CASCADE,
    number varchar(100)
  ,index(article_id)
);

create table calendar_cost (
    id int PRIMARY KEY,
    calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
    cost_value numeric
  ,INDEX(calendar_id)
);

insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');

insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');

insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);

Records: 3  Duplicates: 0  Warnings: 0

Records: 3  Duplicates: 0  Warnings: 0

Records: 2  Duplicates: 0  Warnings: 2

select a.id from article a 
left join calendar c on a.id = c.article_id
where c.id is null

id
3

-- First condition
EXPLAIN
  select a.id from article a 
left join calendar c on a.id = c.article_id
where c.id is null

union all 

-- Second condition
select a.id from article a
JOIN (
    select aa.id from article aa
    join calendar c on aa.id = c.article_id
    join calendar_cost cost on c.id = cost.calendar_id

  LIMIT 1

) t1 ON t1.id <> a.id

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	a	null	index	null	PRIMARY	4	null	3	100.00	Using index
1	PRIMARY	c	null	ref	article_id	article_id	5	fiddle.a.id	3	33.33	Using where; Not exists; Using index
2	UNION	<derived3>	null	system	null	null	null	null	1	100.00	null
2	UNION	a	null	index	null	PRIMARY	4	null	3	66.67	Using where; Using index
3	DERIVED	cost	null	index	calendar_id	calendar_id	5	null	2	100.00	Using where; Using index
3	DERIVED	c	null	eq_ref	PRIMARY,article_id	PRIMARY	4	fiddle.cost.calendar_id	1	100.00	Using where
3	DERIVED	aa	null	eq_ref	PRIMARY	PRIMARY	4	fiddle.c.article_id	1	100.00	Using index

fiddle

- Stu
- November 12, 2022 at 9:45 pm
- 0 votes
0
Try the following using a combination of exists criteria.

Usually, with supporting indexes, this is more performant than simply joining tables as it offers a short-circuit to get out as soon as a match is found, where as joining typically filters after all rows are joined.
```
select a.id 
from article a 
where not exists (
  select * from calendar c 
  where c.article_id =  a.id
)
or (exists (
  select * from calendar c 
  where c.article_id = a.id
)
  and not exists (
    select * from calendar_cost cc
    where cc.calendar_id in (select id from calendar c where c.article_id = a.id)
  )
);
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Select records that do not have at least one child element

Answers