I have a database table that having close to 100 columns – 25 columns are STRING, the rest is INT/DECIMAL/BOOL.
This is the query that I am trying to run – the query is generated based on the search criteria users have to filter the data in the table listings
:
SELECT "listings".*
FROM "listings"
WHERE "listings"."source" = 3
AND "listings"."listing_type" = 4
ORDER BY listings.created_at DESC
LIMIT 100 OFFSET 0
There are variations of this query, depending on the parameters the users select. The query above takes the Rails application more than 4 minutes to respond (on production, it results in the blue screen saying "Something went wrong…" with no error message in the log):
Completed 200 OK in 244372ms (Views: 71.4ms | ActiveRecord: 244263.2ms | Allocations: 17464)
I thought that I might improve the performance by adding indices on the columns that are in the WHERE
and ORDER BY
clause, but it didn’t help in any way.
I also tried to run EXPLAIN ANALYSE SELECT "listings".* FROM "listings" WHERE "listings"."source" = 3 AND "listings"."listing_type" = 4 ORDER BY listings.created_at DESC LIMIT 100 OFFSET 0
in the pgAdmin client and got this:
"Limit (cost=0.42..1903.89 rows=100 width=1587) (actual time=64960.075..170237.003 rows=8 loops=1)"
" -> Index Scan Backward using index_listings_on_created_at on listings (cost=0.42..437892.59 rows=23005 width=1587) (actual time=64960.073..170236.988 rows=8 loops=1)"
" Filter: ((source = 3) AND (listing_type = 4))"
" Rows Removed by Filter: 1022248"
"Planning Time: 25.418 ms"
"Execution Time: 170237.100 ms"
To execute this EXPLAIN ANALYSE...
command, it took 2 mins and 51 seconds to get the statement.
The slowness of working (loading, searching, filtering) with the data in this table makes the whole application impossible to use.
DB table listings
- id (int)
- source (int)
- title (string)
- description (text)
- listing_type (int)
- transaction_type (int)
- region (int)
- uuid (uuid)
- ...other columns...
- created_at (datetime)
- updated_at (datetime)
Existing indices:
INDEX NAME INDEXDEF
listings_pkey CREATE UNIQUE INDEX listings_pkey ON
public.listings USING btree (id) index_listings_on_listing_type CREATE INDEX index_listings_on_listing_type ON public.listings USING btree (listing_type)
index_listings_on_transaction_type CREATE INDEX index_listings_on_transaction_type ON public.listings USING btree (transaction_type)
index_listings_on_region CREATE INDEX index_listings_on_region ON public.listings USING btree (region)
index_listings_on_created_at CREATE INDEX index_listings_on_created_at ON public.listings USING btree (created_at)
index_listings_on_uuid CREATE INDEX index_listings_on_uuid ON public.listings USING btree (uuid)
What are my options to improve its performance?
Note – I use will_paginate
for the pagination purposes and I learned that one source performance issues in the Rails app is that the pagination gem always runs SELECT COUNT(*) FROM table
, which is a problem for big tables.
2
Answers
An index on
(source, listing_type, created_at)
should be a magic solution to the query shown. I don’t know how it will do on the unseen variations described, though.The fact that row estimates are off by a factor of 23005/8 = 2875 certainly doesn’t help come up with robust plans. Do you know why it is off by so much? If you query on just the source and just the listing_type separately (with no LIMIT and no ORDER BY) are the estimates for those accurate?
Your table is very wide, and at scale it is going to be hard to query against.
You could implement more thorough indexing, but that’ll come with a write cost, and you’ll probably still never see great results with a
select *
approach.It’s also possible your database isn’t optimized for your use case. Maybe you need a more powerful cluster, or more memory so you’re not reading from disk. It’s hard to know from what you’ve posted here.
I’d recommend breaking that table apart, honestly. Rely on foreign keys, normalized data, and joins against leaner tables to increase performance.