skip to Main Content

I need to compare few scenarios, which can be fulfilled either by calling:

  1. an additional timestamp column from mysql database or
  2. Looping over the resultant array.

Elaborating further:
CASE 1: 144 byte columns + 4 byte timestamp column for 10K rows, then looping on array of size 50.(download size- 1480000 Bytes)
CASE 2: 144 byte columns for 10K rows, looping on array of size 10000. (download size- 1440000 bytes)

Data download roughly 40KB more for Case1 while 10000 more loop iteration for case2.

Which of the 2 scenarios could be faster, downloading 40KB more or 10000 loop iterations?

2

Answers


  1. Your first scenario is by far the best. Here is why.

    SQL is designed to extract subsets of rows from tables. It’s designed to allow your tables to be many orders of magnitude bigger than your RAM. If you use a query like

    SELECT * 
      FROM mytable
     WHERE mytimestamp >= TODAY() - INTERVAL 1 DAY
       AND mytimestamp < TODAY()
    

    you will get all the rows with timestamps anytime yesterday, for example. If you put an index on the mytimestamp column, SQL will satisfy this query very quickly and only give the rows you need. And it will satisfy any queries looking for a range of timestamps similarly quickly.

    Login or Signup to reply.
  2. There are no answers that are true in 100% of situations. For example, I would prefer to do the first query when I use a fast enough network (anything 1Gbps or higher) and data transfer is free. The difference in the example you show is 40,000 bytes, but it’s only 2.7% of the total.

    On the other hand, if you need to optimize to minimize bandwidth usage, that’s a different scenario. Like if you are transmitting data to a satellite over a very slow link. Or you have a lot of competing network traffic (enough to use up all the bandwidth), and saving 2.7% is significant. Or if you are using a cloud vendor that charges for total bytes transferred on the network.

    You aren’t taking into account the overhead of executing 1000 separate queries. That means 1000x the bytes sent to the database server, as you send queries. That takes some network bandwidth too. Also the database server needs to parse and optimize each query (MySQL does not cache optimization plans as some other RDBMS products do). And then begin executing the query, starting with an index lookup without the context of the previous query result.

    "Performance" is a slippery term. You seem to be concerned only with bandwidth use, but there are other ways of measuring performance.

    • Throughput (bytes per second)
    • Latency (seconds per response)
    • Wait time (seconds until the request begins)

    All of these can be affected by the total load, i.e. number of concurrent requests. A busy system may have traffic congestion or request queueing.

    You can see that this isn’t a simple problem.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search