I want to improve the performance of these 2 queries:
select object into latitude
from data
where predicate = 'latitude'
and subject = ( select object
from data
where subject = (select object
from data
where subject = 'url1' and predicate = '#isLocatedAt')
and predicate = 'http://schema.org/geo');
select object into Longitude
from data
where predicate = 'longitude'
and subject = ( select object
from data
where subject = (select object
from data
where subject = 'url1' and predicate = '#isLocatedAt')
and predicate = 'http://schema.org/geo');
I don’t have an index on data.object because it’s a text and it’s too big.
Explain analyse (for 1 requête):
QUERY PLAN
Index Scan using subjectetpredicate on data (cost=25.19..36.76 rows=3 width=63) (actual time=34.299..34.303 rows=1 loops=1)
Index Cond: (((subject)::text = $1) AND ((predicate)::text = 'latitude'::text))
InitPlan 2 (returns $1)
-> Index Scan using subjectetpredicate on data data_2 (cost=12.94..24.50 rows=3 width=63) (actual time=31.374..31.379 rows=1 loops=1)
Index Cond: (((subject)::text = $0) AND ((predicate)::text = 'geo'::text))
InitPlan 1 (returns $0)
-> Index Scan using subjectetpredicate on data data_1 (cost=0.69..12.25 rows=3 width=63) (actual time=0.329..0.332 rows=1 loops=1)
Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
Planning Time: 1.071 ms
Execution Time: 34.359 ms
2
Answers
Your query plan shows that you’re doing 6 index scans sequentially. Rewrite your script so that you have to do only 4, by storing the shared result in a temporary variable:
You can achieve the same in a single query (which doesn’t necessarily make it any faster or easier to read) by using a CTE or simply by flipping your subqueries to get a result with multiple columns:
Instead of nested subqueries, we can rethink this as a series of self-joins and flip it on its head. I find this much easier to understand, and it’s much faster (at least on this tiny dataset).
To do it in a single query, filter it on
where predicate in ('latitude', 'longitude')
.This is a couple orders of magnitude faster, though it would have to be run against a realistic amount of data to matter.
Demonstration.
This approach is also a step towards generalizing it as a recursive CTE.
Note that you don’t need an index on
subject
because you have an index on(subject, predicate)
.