skip to Main Content

I have a table containing measurement information of several tests per devices:

  • device_id gives the information on which device was tested
  • measurement_no is an incrementing number giving the order in which tests has been performed
  • test gives you the name of the test which is performed
  • is_last_measurement_on_test is a boolean field, giving the information if the specific row is the last measurement of a test. It returns true, if the row is the last row of the device for an specific test. It returns false, if there is a subsequent row of the same device for the same test.
  • ok gives information is the test was okay (=true) or not okay (=false)
  • error_code gives you a specific error code if ok=false, or 0 if ok=true
WITH measurements (device_id,measurement_no,test,is_last_measurement_on_test,ok,error_code) AS ( VALUES
  -- case 1: all measurements good, expecting to show test 3 only
  ('d1',1,'test1',true,true,0),
  ('d1',2,'test2',true,true,0),
  ('d1',3,'test3',true,true,0),
  -- case 2: test 2, expecting to show test 2 only
  ('d2',1,'test1',true,true,0),
  ('d2',2,'test2',true,false,100),
  ('d2',3,'test3',true,true,0),
  -- case 3: test 2 und 3 bad, expecting to show test 2 only
  ('d3',1,'test1',true,true,0),
  ('d3',2,'test2',true,false,100),
  ('d3',3,'test3',true,false,200),
  -- case 4: test 2 bad on first try, second time good, expecting to show test 3 only
  ('d4',1,'test1',true,true,0),
  ('d4',2,'test2',false,false,100),
  ('d4',3,'test2',true,true,0),
  ('d4',4,'test3',true,true,0)
)
select * from measurements
where is_last_measurement_on_test=true

Now I want to filter these rows on following conditions per device:

  • Only the last measurement on each test should be considered -> that’s easy: filtering on is_last_measurement_on_test=true
  • For every device: If there is a bad result (ok=false) in any test where is_last_measurement_on_test=true, I want to display the first test on which the device failed.
  • For every device: If there is no bad result at all (ok=true) in any test where is_last_measurement_on_test=true, I want to display the last test on which the device passed.

For the given example above, I am expecting that only these rows to display:

  ('d1',3,'test3',true,true,0) 
  ('d2',2,'test2',true,false,100)
  ('d3',2,'test2',true,false,100)
  ('d4',4,'test3',true,true,0)

How can I receive this result? I already tried a lot on using first_value, for example

first_value(nullif(error_code,0)) over (partition by device_id)

but i wasn’t able to handle it in the way I wanted it to be.

2

Answers


  1. Having this sample data:

    CREATE TABLE measurements (
      device_id text,
      measurement_no integer,
      test text,
      is_last_measurement_on_test boolean,
      ok boolean,
      error_code integer
    );
    
    INSERT INTO measurements (device_id, measurement_no, test, is_last_measurement_on_test, ok, error_code)
    VALUES
      ('d1', 1, 'test1', true, true, 0),
      ('d1', 2, 'test2', true, true, 0),
      ('d1', 3, 'test3', true, true, 0),
      ('d2', 1, 'test1', true, true, 0),
      ('d2', 2, 'test2', true, false, 100),
      ('d2', 3, 'test3', true, true, 0),
      ('d3', 1, 'test1', true, true, 0),
      ('d3', 2, 'test2', true, false, 100),
      ('d3', 3, 'test3', true, false, 200),
      ('d4', 1, 'test1', true, true, 0),
      ('d4', 2, 'test2', false, false, 100),
      ('d4', 3, 'test2', true, true, 0),
      ('d4', 4, 'test3', true, true, 0);
    

    It will be like:

    WITH DataSource AS
    (
      SELECT *
           ,MIN(CASE WHEN ok = false THEN test END) OVER (PARTITION BY device_id) AS first_failed_test
           ,ROW_NUMBER() OVER (PARTITION BY device_id ORDER BY test DESC) AS test_id
      FROM measurements
      WHERE is_last_measurement_on_test = true
    )
    SELECT device_id, measurement_no, test, is_last_measurement_on_test, ok, error_code
    FROM DataSource
    WHERE (first_failed_test IS NULL and test_id = 1)
        OR (first_failed_test = test)
    

    enter image description here

    The idea is get the name of the first fail test and to order the test using row_number starting from the latest one.

    The important part is that here I am ordering the test by there name. In your real scenario, I am guessing you have an record_id or date, which can be used to do this. So, you will need to change a little bit the code.

    Login or Signup to reply.
  2. distinct on gets you the one, "top" record per device_id.

    order by lets you establish the order according to which a record might or might not end up on "top". Since your 2nd and 3rd case require opposite ordering/priority of tests per device_id:

    1. earliest negative record when there are negatives
    2. latest record if there are no negatives

    You can flip that order accordingly, with a case.

    select distinct on (device_id) * 
    from measurements 
    where is_last_measurement_on_test
    order by device_id,   --Necessary for distinct on to return one row per device_id
             not ok desc, --If all is ok for a device_id, this does nothing. 
                          --Otherwise it'll put negative tests results first
             (case when not ok then -1 else 1 end)*measurement_no desc;
                          --When considering negative test results, it'll 
                          --put earliest first. Otherwise it'll put latest first.
    

    It’s a common misconception that order by section is somehow restricted to plain column names or aliases. Meanwhile, quoting the doc, it gives you as much freedom as select section does:

    The sort expression(s) can be any expression that would be valid in the query’s select list.

    Online demo.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search