How can I improve multiple queries into one - Postgresql

Camel4488
December 29, 2022
246 views
1 vote
2 Answers

I want to improve the performance of these 2 queries:

select object into latitude  
from data
where predicate = 'latitude' 
and subject = ( select object 
                from data 
                where subject = (select object 
                                 from data 
                                  where subject = 'url1' and predicate = '#isLocatedAt') 
and predicate = 'http://schema.org/geo');

select object into Longitude  
from data 
where predicate = 'longitude' 
and subject = ( select object 
                from data 
                where subject = (select object 
                                 from data 
                                 where subject = 'url1' and predicate = '#isLocatedAt') 
and predicate = 'http://schema.org/geo');

fiddle

I don’t have an index on data.object because it’s a text and it’s too big.

Explain analyse (for 1 requête):

QUERY PLAN
Index Scan using subjectetpredicate on data  (cost=25.19..36.76 rows=3 width=63) (actual time=34.299..34.303 rows=1 loops=1)
  Index Cond: (((subject)::text = $1) AND ((predicate)::text = 'latitude'::text))
  InitPlan 2 (returns $1)
    ->  Index Scan using subjectetpredicate on data data_2  (cost=12.94..24.50 rows=3 width=63) (actual time=31.374..31.379 rows=1 loops=1)
          Index Cond: (((subject)::text = $0) AND ((predicate)::text = 'geo'::text))
          InitPlan 1 (returns $0)
            ->  Index Scan using subjectetpredicate on data data_1  (cost=0.69..12.25 rows=3 width=63) (actual time=0.329..0.332 rows=1 loops=1)
                  Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
Planning Time: 1.071 ms
Execution Time: 34.359 ms

Tags: postgresql

Answers

Your query plan shows that you’re doing 6 index scans sequentially. Rewrite your script so that you have to do only 4, by storing the shared result in a temporary variable:

select object into geolocation
--            ^^^^^^^^^^^^^^^^
from data
where predicate = 'http://schema.org/geo'
  and subject = (select object 
                 from data 
                 where predicate = '#isLocatedAt'
                   and subject = 'url1');

select object into latitude  
from data
where predicate = 'latitude' 
  and subject = geolocation;
--              ^^^^^^^^^^^

select object into longitude  
from data 
where predicate = 'longitude' 
  and subject = geolocation;
--              ^^^^^^^^^^^

You can achieve the same in a single query (which doesn’t necessarily make it any faster or easier to read) by using a CTE or simply by flipping your subqueries to get a result with multiple columns:

select
  (
    select object  
    from data
    where predicate = 'longitude'
      and subject = geolocation
  ), (
    select object
    from data
    where predicate = 'latitude'
      and subject = geolocation
  )
  into longitude, latitude
from (
  select object as geolocation
  from data
  where predicate = 'http://schema.org/geo'
    and subject = (select object 
                   from data 
                   where predicate = '#isLocatedAt'
                     and subject = 'url1')
) as temp;

Instead of nested subqueries, we can rethink this as a series of self-joins and flip it on its head. I find this much easier to understand, and it’s much faster (at least on this tiny dataset).

url1, #isLocatedAt, y
y, http://schema.org/geo, z
z, (latitude, longitude)

To do it in a single query, filter it on where predicate in ('latitude', 'longitude').

select
  l2.predicate, l2.object
from data l0
join data l1 on l0.object = l1.subject and l1.predicate  = 'http://schema.org/geo'
join data l2 on l1.object = l2.subject and l2.predicate in ('latitude', 'longitude')
where l0.subject = 'url1' and l0.predicate = '#isLocatedAt'

This is a couple orders of magnitude faster, though it would have to be run against a realistic amount of data to matter.

QUERY PLAN
Nested Loop  (cost=0.43..24.51 rows=1 width=548) (actual time=0.040..0.042 rows=2 loops=1)
  ->  Nested Loop  (cost=0.29..16.34 rows=1 width=32) (actual time=0.022..0.023 rows=1 loops=1)
        Join Filter: (l0.object = (l1.subject)::text)
        ->  Index Scan using subjectetpredicate on data l0  (cost=0.14..8.16 rows=1 width=32) (actual time=0.011..0.012 rows=1 loops=1)
              Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
        ->  Index Scan using predicate on data l1  (cost=0.14..8.16 rows=1 width=548) (actual time=0.007..0.008 rows=1 loops=1)
              Index Cond: ((predicate)::text = 'http://schema.org/geo'::text)
  ->  Index Scan using subjectetpredicate on data l2  (cost=0.14..8.16 rows=1 width=1064) (actual time=0.017..0.018 rows=2 loops=1)
        Index Cond: ((subject)::text = l1.object)
        Filter: ((predicate)::text = ANY ('{latitude,longitude}'::text[]))
Planning Time: 0.192 ms
Execution Time: 0.072 ms

Demonstration.

This approach is also a step towards generalizing it as a recursive CTE.

Note that you don’t need an index on subject because you have an index on (subject, predicate).

Please signup or login to give your own answer.

Click here to cancel reply.

How can I improve multiple queries into one – Postgresql

Answers