Rails 7 + PostgreSQL: Super slow searching data in a table having 1,2M rows. How do I speed it up?

user984621
August 6, 2024
53 views
0 votes
2 Answers

I have a database table that having close to 100 columns – 25 columns are STRING, the rest is INT/DECIMAL/BOOL.

This is the query that I am trying to run – the query is generated based on the search criteria users have to filter the data in the table listings:

SELECT "listings".*
FROM "listings"
WHERE "listings"."source" = 3
  AND "listings"."listing_type" = 4
ORDER BY listings.created_at DESC
LIMIT 100 OFFSET 0

There are variations of this query, depending on the parameters the users select. The query above takes the Rails application more than 4 minutes to respond (on production, it results in the blue screen saying "Something went wrong…" with no error message in the log):

Completed 200 OK in 244372ms (Views: 71.4ms | ActiveRecord: 244263.2ms | Allocations: 17464)

I thought that I might improve the performance by adding indices on the columns that are in the WHERE and ORDER BY clause, but it didn’t help in any way.

I also tried to run EXPLAIN ANALYSE SELECT "listings".* FROM "listings" WHERE "listings"."source" = 3 AND "listings"."listing_type" = 4 ORDER BY listings.created_at DESC LIMIT 100 OFFSET 0 in the pgAdmin client and got this:

"Limit  (cost=0.42..1903.89 rows=100 width=1587) (actual time=64960.075..170237.003 rows=8 loops=1)"
"  ->  Index Scan Backward using index_listings_on_created_at on listings  (cost=0.42..437892.59 rows=23005 width=1587) (actual time=64960.073..170236.988 rows=8 loops=1)"
"        Filter: ((source = 3) AND (listing_type = 4))"
"        Rows Removed by Filter: 1022248"
"Planning Time: 25.418 ms"
"Execution Time: 170237.100 ms"

To execute this EXPLAIN ANALYSE... command, it took 2 mins and 51 seconds to get the statement.

The slowness of working (loading, searching, filtering) with the data in this table makes the whole application impossible to use.

DB table listings

- id (int)
- source (int)
- title (string)
- description (text)
- listing_type (int)
- transaction_type (int)
- region (int)
- uuid (uuid)
- ...other columns...
- created_at (datetime)
- updated_at (datetime)

Existing indices:

INDEX NAME           INDEXDEF
listings_pkey      CREATE UNIQUE INDEX listings_pkey ON 
public.listings USING btree (id)          index_listings_on_listing_type CREATE INDEX index_listings_on_listing_type ON public.listings USING btree (listing_type)
index_listings_on_transaction_type  CREATE INDEX index_listings_on_transaction_type ON public.listings USING btree (transaction_type)
index_listings_on_region  CREATE INDEX index_listings_on_region ON public.listings USING btree (region)
index_listings_on_created_at  CREATE INDEX index_listings_on_created_at ON public.listings USING btree (created_at)
index_listings_on_uuid  CREATE INDEX index_listings_on_uuid ON public.listings USING btree (uuid)

What are my options to improve its performance?

Note – I use will_paginate for the pagination purposes and I learned that one source performance issues in the Rails app is that the pagination gem always runs SELECT COUNT(*) FROM table, which is a problem for big tables.

Answers

- jjanes
- July 29, 2024 at 5:42 pm
- 0 votes
0
An index on (source, listing_type, created_at) should be a magic solution to the query shown. I don’t know how it will do on the unseen variations described, though.

The fact that row estimates are off by a factor of 23005/8 = 2875 certainly doesn’t help come up with robust plans. Do you know why it is off by so much? If you query on just the source and just the listing_type separately (with no LIMIT and no ORDER BY) are the estimates for those accurate?

Login or Signup to reply.

- vinyl
- August 6, 2024 at 6:09 am
- 0 votes
0
Your table is very wide, and at scale it is going to be hard to query against.

You could implement more thorough indexing, but that’ll come with a write cost, and you’ll probably still never see great results with a select * approach.

It’s also possible your database isn’t optimized for your use case. Maybe you need a more powerful cluster, or more memory so you’re not reading from disk. It’s hard to know from what you’ve posted here.

I’d recommend breaking that table apart, honestly. Rely on foreign keys, normalized data, and joins against leaner tables to increase performance.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.