skip to Main Content

There are two large table num,pre in database :

d num
                                     Table "public.num"
  Column  |          Type           | Collation | Nullable |             Default             
----------+-------------------------+-----------+----------+---------------------------------
 id       | integer                 |           | not null | nextval('num_id_seq'::regclass)
 adsh     | character varying(20)   |           |          | 
 tag      | character varying(256)  |           |          | 
 version  | character varying(20)   |           |          | 
 coreg    | character varying(256)  |           |          | 
 ddate    | date                    |           |          | 
 qtrs     | numeric(8,0)            |           |          | 
 uom      | character varying(20)   |           |          | 
 value    | numeric(28,4)           |           |          | 
 footnote | character varying(1024) |           |          | 

 d pre
                                     Table "public.pre"
  Column  |          Type          | Collation | Nullable |             Default             
----------+------------------------+-----------+----------+---------------------------------
 id       | integer                |           | not null | nextval('pre_id_seq'::regclass)
 adsh     | character varying(20)  |           |          | 
 report   | numeric(6,0)           |           |          | 
 line     | numeric(6,0)           |           |          | 
 stmt     | character varying(2)   |           |          | 
 inpth    | boolean                |           |          | 
 rfile    | character(1)           |           |          | 
 tag      | character varying(256) |           |          | 
 version  | character varying(20)  |           |          | 
 plabel   | character varying(512) |           |          | 
 negating | boolean                |           |          | 

Check how many records in the tables:

select count(*) from num;
  count   
----------
 83862587
(1 row)

Time: 204945.436 ms (03:24.945)

select count(*) from pre;
  count   
----------
 36738034
(1 row)

Time: 100604.085 ms (01:40.604)

Execute a long query :

explain analyze  select tag,uom,qtrs,value,ddate  from num 
    where adsh='0000320193-22-000108' and tag in 
    (select tag from pre where stmt='IS' and 
    adsh='0000320193-22-000108') and ddate='2022-09-30';

It cost almost 7 minutes 30 seconds.

                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=2000.00..2909871.29 rows=2 width=59) (actual time=357717.922..450523.035 rows=45 loops=1)
   Join Filter: ((num.tag)::text = (pre.tag)::text)
   Rows Removed by Join Filter: 61320
   ->  Gather  (cost=1000.00..1984125.01 rows=32 width=59) (actual time=190.355..92987.731 rows=678 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Parallel Seq Scan on num  (cost=0.00..1983121.81 rows=13 width=59) (actual time=348.753..331304.671 rows=226 loops=3)
               Filter: (((adsh)::text = '0000320193-22-000108'::text) AND (ddate = '2022-09-30'::date))
               Rows Removed by Filter: 27953970
   ->  Materialize  (cost=1000.00..925725.74 rows=43 width=33) (actual time=0.097..527.331 rows=91 loops=678)
         ->  Gather  (cost=1000.00..925725.53 rows=43 width=33) (actual time=65.880..357527.133 rows=96 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on pre  (cost=0.00..924721.22 rows=18 width=33) (actual time=201.713..357490.037 rows=32 loops=3)
                     Filter: (((adsh)::text = '0000320193-22-000108'::text) AND ((stmt)::text = 'IS'::text))
                     Rows Removed by Filter: 12245979
 Planning Time: 0.632 ms
 JIT:
   Functions: 27
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 5.870 ms, Inlining 272.489 ms, Optimization 367.828 ms, Emission 213.288 ms, Total 859.474 ms
 Execution Time: 450524.974 ms
(22 rows)

Time: 450526.084 ms (07:30.526)

How can optimize the database to reduce time on query?Add index and close something (the db running on my local pc without any other users)?

3

Answers


  1. The subselect in the where clause needs to be rewritten into a table join which will be much faster.The DISTINCT may not be necessary, but I don’t know the cardinality of your data, so I included it.

    select DISTINCT
          num.tag
        , num.uom
        , num.qtrs
        , num.value
        , num.ddate
    from num
    inner join pre
        on num.adsh = pre.adsh
        and num.adsh = 'xxxx'
        and pre.stmt = 'yy'
        and num.ddate = '2022-09-30'
    
    Login or Signup to reply.
  2. Its performing a full table scan on those columns, as you see from the analysis that says "Parallel Seq Scan" (ie sequential scan) on num and pre.

    this means it is looking at every row to check if it should be used.

    To speed it up, massively, you need to create an index on the columns used in the where clauses (pre.stmt, pre.adsh, num.adsh and num.ddate). Then the query will use the index to decide which rows to include, and as indexes are specially organised for this task, the performance will increase.

    Login or Signup to reply.
  3. adding a CTE would greatly improve your query :

    WITH t AS (
    SELECT tag FROM tag WHERE stmt= 'yy' AND adsh = 'xxxx' 
    )
    select tag,uom,qtrs,value,ddate  
    from num n
    JOIN t ON t.tag = q.tag 
    where adsh='xxxx' and ddate='2022-09-30';
    

    Also you can specialize one of the used columns if the queries used most frequently are very known :

    CREATE INDEX idx_num_adshxxxx ON num(adsh) WHERE adsh='xxxx';
    

    This would create a very fast index for just a small portion of the table.
    Is important to notice that a indexes have very limited usage for non-preplanned selects, for large tables is often better to create a index scan select and re-query the results from a CTE than affording to load the entire table, as example this is a very common predicate of every day db usage and its problems:

    WHERE LOWER(adsh) = 'xxxx' ;
    

    Now notice how important is for your queries to be matching the correct indexing
    enter image description here

    This means if you change the column being researched your indexes should match or they doenst get used, this is same for integercolumn::text = ‘x’ or date::text = ‘2019-10-01’
    after adding the proper index this get solved:
    enter image description here

    enter image description here

    as much tables start getting bigger less and less random filters can be allowed, table scan will toggle a S.O memory cache , than later the same data will be doubled in the postgresql cache and only later will stabilize .
    Unstable postgresql cached queries will reduce the speed of the previously cached queries whenever a new random query is executed if the cached memory of the server is already on the limit.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search