skip to Main Content

I have an archive table in which only the changes to a record are stored with a validity date. An example would be a price archive for products.

And if you now want to select the status at a given date, I have the following SELECT.
The problem is that this gets slower and slower the more changes there are, because in the first step the database first reads all records and then in the next step appends those that have a greater validity date. In the end, only the ones with the largest validity date remain on the far left.

Is there any way to make this more effective?

SELECT d1.*
FROM data d1
LEFT JOIN data d2 
   ON d1.id = d2.id AND 
      d1.valid_from_date < d2.valid_from_date
WHERE 
   d2.id IS NULL AND 
   d1.valid_from_date <= '2023-01-01';

CREATE TABLE data (
    id int,
    valid_from_date DATE,
    foo varchar(255),
    bar varchar(255),
    PRIMARY KEY (id, valid_from_date)
);

2

Answers


  1. Try adding this index on the date and id columns.

    CREATE INDEX x_valid_from_date_id ON data(valid_from_date, id);
    

    EDIT According to your comment you already have an index (your PK) on (id, valid_from_date).

    EDIT If you’re looking for the latest row for each id on or before a particular date, this query might help.

    SELECT a.*
      FROM data a
      JOIN (  
           SELECT id, MAX(valid_from_date) valid_from_date
             FROM data
            WHERE valid_from_date <= '2023-01-01'
            GROUP BY id
      ) b ON a.id = b.id AND a.valid_from_date = b.valid_from_date
    

    The subquery gets the date you want for every id, then the JOIN retrieves the details. The MariaDB / MySQL query planner is quite clever about MAX … GROUP BY queries given appropriate indexes.

    Read Markus Winand’s excellent ebook to learn a lot more about this topic.

    Login or Signup to reply.
  2. You can try to use NOT EXISTS instead LEFT JOIN and NOT NULL condition like:

    SELECT d1.*
    FROM data d1
    WHERE 
       d1.valid_from_date <= '2023-01-01' AND
       NOT EXISTS (
         SELECT 1 
         FROM data d2 
         WHERE d1.id = d2.id AND 
            d1.valid_from_date < d2.valid_from_date
       );
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search