skip to Main Content

I have a query over two tables — matchoverview

id, home_id, away_id, date, season, result

matchattributes

id, game_id, attribute_id, attribute_value

My query

select m.id from matchOverview m
  join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (3,4,5,6)
  group by m.id
     having sum(case when ma.attribute_id in (3,4)
     then ma.attribute_value end) > 3
     or sum(case when ma.attribute_id in (5,6)
     then ma.attribute_value end) > 3;

Which returns all match ids where the sum of attributes 3 and 4 or 5 and 6 is greater than 3.

This particular query returns 900k rows, unsurprisingly in phpmyadmin this query takes a deal of time, as I imagine it needs to format the results into a table, but it clocks the query at .0113 seconds.

Yet when I make this query over PHP it takes 15 seconds, if I alter the query to LIMIT to only 100 results, it runs almost instantly, leaving me with the belief the only possibility being the amount of data being transferred is what is slowing it.

But would it really take 15 seconds to transfer 1M 4 byte ints over the network?

Is the only solution to further limit the query so that it returns less results?

EDIT

Results of an EXPLAIN on my query

id  select_type  table  type   key             key     key_len ref                 rows    Extra
1   SIMPLE       m      index  PRIMARY         PRIMARY 4       NULL                2790717 Using index
1   SIMPLE       ma     ref    match,attribute match   4       opta_matches2.m.id  2       Using where

How I am timing my SQL query

$time_pre = microtime(true);
$quer = $db->query($sql);
$time_post = microtime(true);
$exec_time = $time_post - $time_pre;

Data from slow query log

# Thread_id: 15  Schema: opta_matches2  QC_hit: No
# Query_time: 15.594386  Lock_time: 0.000089  Rows_sent: 923962  Rows_examined: 15688514
# Rows_affected: 0  Bytes_sent: 10726615

I am ok with dealing with a 15 second query if it is because that is how long it takes the data to move over the network, but if the query or my table can be optimized that is the best solution

The row count is not the issue, the following query

select m.id from matchOverview m
  join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (1,2,3,4)
 group by m.id
  having sum(case when ma.attribute_id in (3,4)
   then ma.attribute_value end) > 8
  and sum(case when ma.attribute_id in (1,2)
   then ma.attribute_value end) = 0;

returns only 24 rows but also takes ~15 seconds

3

Answers


  1. phpMyAdmin doesn’t give you all results,
    it also using limit to default 25 results.

    If you change this limit by changing "Number of rows" select box or type the limit in query, It will take more time to run the query.

    Login or Signup to reply.
  2. I think if you rewrote the conditions, at a minimum you might find something out. For instance, I think this does the same as the second example (the 24 results one);

    SELECT
       m.id
     , at.total_12
     , at.total_34
    FROM matchOverview AS m
    JOIN (
        SELECT
           m.id
         , SUM(IF (ma.attribute_id IN(1,2), ma.attribute_value, 0)) AS total_12
         , SUM(IF (ma.attribute_id IN(3,4), ma.attribute_value, 0)) AS total_34
        FROM matchAttributes AS ma
        WHERE m.id = ma.match_id
        AND ma.attribute_id IN(1,2,3,4)
        GROUP BY m.id
    ) AS at
    WHERE at.total_12 > 0
    AND at.total_34 > 8
    

    It’s more verbose, but it could help triangulate where the bottleneck(s) come from more readily.

    For instance, if (a working) version of the above is still slow, then run the inner query with the GROUP BY intact. Still slow? Remove the GROUP BY. Move the GROUP BY/SUM into the outer query, what happens?

    That kinda thing. I can’t run it so I can’t work out a more precise answer, which I would like to know.

    Login or Signup to reply.
  3. There are probably two significant parts to the timing: Locate the rows and decide which ids to send; then send them. I will address both.

    Here’s a way to better separate the elapsed time for just the query (and not the network): SELECT COUNT(*) FROM (...) AS x; Where ‘…’ is the 1M-row query.

    Speeding up the query

    Since you aren’t really using matchoverview, let’s get rid of it:

    select  ma.match_id
        from  matchAttributes ma
        WHERE  ma.attribute_id in (3,4,5,6)
        group by  ma.match_id
        having  sum(case when ma.attribute_id in (3,4) then ma.attribute_value end) > 3
            or  sum(case when ma.attribute_id in (5,6) then ma.attribute_value end) > 3;
    

    And have a composite index with the columns in this order:

    INDEX(attribute_id, attribute_value, match_id)
    

    As for the speedy LIMIT, that is because it can stop short. But a LIMIT without an ORDER BY is rather meaningless. If you add an ORDER BY, it will have to gather all the results, sort them, and finally perform the LIMIT.

    Network transfer time

    Transferring millions of rows (I see 10.7MB in the slowlog) over the network is time-consuming, but takes virtually no CPU time.

    One EXPLAIN implies that there might be 2.8M rows; is that about correct? The slowlog says that about 16M rows are touched — this may be because of the two tables, join, group by, etc. My reformulation and index should decrease the 16M significantly, hence decrease the elapsed time (before the network transfer time).

    923K rows "sent" — What will the client do with that many rows. In general, I find that more than a few thousand rows "sent" indicates poor design.

    "take 15 seconds to transfer 1M 4 byte ints over the network" — That is elapsed time, and cannot be sped up except by sending fewer rows. (BTW, it is probably sent as strings of several digits, plus overhead for each row; I don’t whether the 10726615 is actual network bytes or counts only the ints.)

    "the ids are used in an internal calculation" — How do you calculate with ids? If you are looking up the ids in some other place, perhaps you can add complexity to the query, thereby doing more work before hitting the network; then shipping less data?

    If you want to discuss further, please provide SHOW CREATE TABLE. (It may have some details that don’t show up in your simplified table definition.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search