skip to Main Content

I want to improve the performance of these 2 queries:

select object into latitude  
from data
where predicate = 'latitude' 
and subject = ( select object 
                from data 
                where subject = (select object 
                                 from data 
                                  where subject = 'url1' and predicate = '#isLocatedAt') 
and predicate = 'http://schema.org/geo');

select object into Longitude  
from data 
where predicate = 'longitude' 
and subject = ( select object 
                from data 
                where subject = (select object 
                                 from data 
                                 where subject = 'url1' and predicate = '#isLocatedAt') 
and predicate = 'http://schema.org/geo');

fiddle

I don’t have an index on data.object because it’s a text and it’s too big.

Explain analyse (for 1 requête):

QUERY PLAN
Index Scan using subjectetpredicate on data  (cost=25.19..36.76 rows=3 width=63) (actual time=34.299..34.303 rows=1 loops=1)
  Index Cond: (((subject)::text = $1) AND ((predicate)::text = 'latitude'::text))
  InitPlan 2 (returns $1)
    ->  Index Scan using subjectetpredicate on data data_2  (cost=12.94..24.50 rows=3 width=63) (actual time=31.374..31.379 rows=1 loops=1)
          Index Cond: (((subject)::text = $0) AND ((predicate)::text = 'geo'::text))
          InitPlan 1 (returns $0)
            ->  Index Scan using subjectetpredicate on data data_1  (cost=0.69..12.25 rows=3 width=63) (actual time=0.329..0.332 rows=1 loops=1)
                  Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
Planning Time: 1.071 ms
Execution Time: 34.359 ms

2

Answers


  1. Your query plan shows that you’re doing 6 index scans sequentially. Rewrite your script so that you have to do only 4, by storing the shared result in a temporary variable:

    select object into geolocation
    --            ^^^^^^^^^^^^^^^^
    from data
    where predicate = 'http://schema.org/geo'
      and subject = (select object 
                     from data 
                     where predicate = '#isLocatedAt'
                       and subject = 'url1');
    
    select object into latitude  
    from data
    where predicate = 'latitude' 
      and subject = geolocation;
    --              ^^^^^^^^^^^
    
    select object into longitude  
    from data 
    where predicate = 'longitude' 
      and subject = geolocation;
    --              ^^^^^^^^^^^
    

    You can achieve the same in a single query (which doesn’t necessarily make it any faster or easier to read) by using a CTE or simply by flipping your subqueries to get a result with multiple columns:

    select
      (
        select object  
        from data
        where predicate = 'longitude'
          and subject = geolocation
      ), (
        select object
        from data
        where predicate = 'latitude'
          and subject = geolocation
      )
      into longitude, latitude
    from (
      select object as geolocation
      from data
      where predicate = 'http://schema.org/geo'
        and subject = (select object 
                       from data 
                       where predicate = '#isLocatedAt'
                         and subject = 'url1')
    ) as temp;
    
    Login or Signup to reply.
  2. Instead of nested subqueries, we can rethink this as a series of self-joins and flip it on its head. I find this much easier to understand, and it’s much faster (at least on this tiny dataset).

    1. url1, #isLocatedAt, y
    2. y, http://schema.org/geo, z
    3. z, (latitude, longitude)

    To do it in a single query, filter it on where predicate in ('latitude', 'longitude').

    select
      l2.predicate, l2.object
    from data l0
    join data l1 on l0.object = l1.subject and l1.predicate  = 'http://schema.org/geo'
    join data l2 on l1.object = l2.subject and l2.predicate in ('latitude', 'longitude')
    where l0.subject = 'url1' and l0.predicate = '#isLocatedAt'
    

    This is a couple orders of magnitude faster, though it would have to be run against a realistic amount of data to matter.

    QUERY PLAN
    Nested Loop  (cost=0.43..24.51 rows=1 width=548) (actual time=0.040..0.042 rows=2 loops=1)
      ->  Nested Loop  (cost=0.29..16.34 rows=1 width=32) (actual time=0.022..0.023 rows=1 loops=1)
            Join Filter: (l0.object = (l1.subject)::text)
            ->  Index Scan using subjectetpredicate on data l0  (cost=0.14..8.16 rows=1 width=32) (actual time=0.011..0.012 rows=1 loops=1)
                  Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
            ->  Index Scan using predicate on data l1  (cost=0.14..8.16 rows=1 width=548) (actual time=0.007..0.008 rows=1 loops=1)
                  Index Cond: ((predicate)::text = 'http://schema.org/geo'::text)
      ->  Index Scan using subjectetpredicate on data l2  (cost=0.14..8.16 rows=1 width=1064) (actual time=0.017..0.018 rows=2 loops=1)
            Index Cond: ((subject)::text = l1.object)
            Filter: ((predicate)::text = ANY ('{latitude,longitude}'::text[]))
    Planning Time: 0.192 ms
    Execution Time: 0.072 ms
    

    Demonstration.

    This approach is also a step towards generalizing it as a recursive CTE.


    Note that you don’t need an index on subject because you have an index on (subject, predicate).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search