skip to Main Content

I have faced this scenario in one of my interview. There will one table with millions of records and the table is going to have only two columns id which is primary key and time_stamp which is of type date time. I want to filter the data based on time_stamp.
NOTE : I should not index the time_stamp column. In such case how can I retrieve the data in a faster way?

This is the answer which I have provided to my interviewer.

  • We can do partitioning for that table to get performance at the time of retrieval.
  • We can do sharding for that table to get performance at the time of retrieval.

He said like you should not apply those two concepts for this questions!!! He asked me like this, Is there any other ways to do this? I’m curious about this question, I’m looking for the answer.

My Query:

  • Is it possible to retrieve the data from time_stamp column without indexing,partitioning and sharding?
  • Doing indexing for the time_stamp column is bad idea?

2

Answers


  1. Using an index to reduce the examined rows is a legitimate way to do this in MySQL, and it’s the most effective. There’s no issue with using an index on a datetime data type. I’ve done it frequently myself.

    I assume the interviewer was just trying to assess your knowledge to come up with a solution other than the default, most common way to do it. Testing the breadth of your knowledge, in other words.

    There might be situations where one can’t add an index to a table. For example, not enough storage space, or no time to build the index.

    Partitioning and sharding are reasonable answers.

    Another answer could be caching. That is, storing a copy of the data you need in high-speed RAM. Of course this introduces other issues, like how do you know the copy of the data in cache is in sync with the data in the table, but it is a popular method of improving average performance.

    It’s also possible the interviewer had a very specific and clever answer he was seeking, some uncommon solution that almost no one would know. This is generally a low-quality interview question, because it tells the interviewer very little about the candidate. It only tells them one fact: does the interview candidate know that specific trivia or not. Unfortunately, it’s a pretty common interview style.

    Sounds to me like you dodged a bullet.

    Login or Signup to reply.
  2. (I agree with Bill, but here is my two-cents.)

    Partitioning and Sharding each would [probably] take longer to implement than indexing. And neither would lead to a SELECT ... WHERE ts ... that is as efficient as with an INDEX.

    With only a PK and ts, InnoDB would build the INDEX as effectively a "table" sorted by ts and containing all the table’s columns. Again, this is very effective.

    If the problem had pointed out that the rows were inserted in chronological order, then you could play games with the id since it works very much like ts. But then, how would the query get started with WHERE ts ...?

    Indexing on a DATETIME or TIMESTAMP is fine. But be it may not be safe to make it UNIQUE. When there are more columns, it is also quite reasonable to have INDEX(order_id, ts). Or even

    PRIMARY KEY(order_id, ts, id),
    INDEX(id)
    

    This "clusters" together the rows for one "order", making such lookups slightly more efficient. And it orders them chronologically, should that be desirable. There is still id AUTO_INCREMENT to uniquely identify each row, but that may not be necessary. id is in the PK to make it unique (if ts cannot be trusted).

    Partitioning is rarely useful. Sharding is complex and hopefully beyond anything your job will need.

    Partitioning may actually consume more disk space than indexing.

    Bottom line — The interview question was not a good one.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search