Mysql - How to make COUNT(*) get executed quickly on a huge dataset?

MartinAJ
April 25, 2024
152 views
0 votes
2 Answers

I have the following query:

select count(*) as aggregate from users where date(created_at) < '2024-04-01';

It returns 2337355 which is a correct number. But it takes over 1.5 secs to be executed. Any idea how can I make it more optimal and faster?

Noted that I have an idex on users(created_at, type).

Also tried removing date() function like where created_at < '2024-04-01'; but still slow.

Answers

- BerndBuffen
- April 25, 2024 at 9:36 pm
- 0 votes
0
If possible, NEVER use a function in the WHERE on a field. Then MySQL has to read every line, call the function and can only then compare with the VALUE. This means that this will ALWAYS be a FULL TABLE SCAN and therefore cannot use an INDEX.

It could be more sensible, for example:
```
WHERE created_at BETWEEN '2024-04-01 00:00:00' AND '2024-04-01 23:59:59';
```
to use
Login or Signup to reply.

- BillKarwin
- April 25, 2024 at 9:55 pm
- 0 votes
0
COUNT() queries tend to more expensive than one might think. They’re about as expensive than any other query that examines millions of rows.

Some people assume because COUNT() returns a single integer result, that’s quite small, so the query must be quicker than fetching all the data. But that’s not the way it works.

The cost of the query is roughly proportional to the number of rows examined, not the size of the result.

To count rows, InnoDB has to examine each row, because of MVCC. Using an index helps to reduce the number of examined rows, but that could still require examining millions of rows.

So it’s pretty much the same in performance cost as a query that reads the same number of rows for some other purpose. It’s quicker to pass a single integer back to the client than returning millions of rows, of course, but network transfer speed isn’t the bottleneck here.

The workarounds to make your query faster include:
- Cache the result of the costly query, so you don’t have to run it as often.
- Cache partial results, such as daily counts, then sum up these counts for some date range you’re interested in. That involves examining fewer rows (e.g. one row per day), so it should be much faster.
- Get faster server hardware, and more RAM for the buffer pool. It may still have to examine millions of rows, but to the extent that it does this faster on a more powerful server, it will give you the result faster. It’s unlikely to improve performance by orders of magnitude, but it may at least give you a percentage improvement.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – How to make COUNT(*) get executed quickly on a huge dataset?

Answers