PostgreSQL: Get first non-true row or last row

user10679526
February 7, 2023
168 views
0 votes
2 Answers

I have a table containing measurement information of several tests per devices:

device_id gives the information on which device was tested
measurement_no is an incrementing number giving the order in which tests has been performed
test gives you the name of the test which is performed
is_last_measurement_on_test is a boolean field, giving the information if the specific row is the last measurement of a test. It returns true, if the row is the last row of the device for an specific test. It returns false, if there is a subsequent row of the same device for the same test.
ok gives information is the test was okay (=true) or not okay (=false)
error_code gives you a specific error code if ok=false, or 0 if ok=true

WITH measurements (device_id,measurement_no,test,is_last_measurement_on_test,ok,error_code) AS ( VALUES
  -- case 1: all measurements good, expecting to show test 3 only
  ('d1',1,'test1',true,true,0),
  ('d1',2,'test2',true,true,0),
  ('d1',3,'test3',true,true,0),
  -- case 2: test 2, expecting to show test 2 only
  ('d2',1,'test1',true,true,0),
  ('d2',2,'test2',true,false,100),
  ('d2',3,'test3',true,true,0),
  -- case 3: test 2 und 3 bad, expecting to show test 2 only
  ('d3',1,'test1',true,true,0),
  ('d3',2,'test2',true,false,100),
  ('d3',3,'test3',true,false,200),
  -- case 4: test 2 bad on first try, second time good, expecting to show test 3 only
  ('d4',1,'test1',true,true,0),
  ('d4',2,'test2',false,false,100),
  ('d4',3,'test2',true,true,0),
  ('d4',4,'test3',true,true,0)
)
select * from measurements
where is_last_measurement_on_test=true

Now I want to filter these rows on following conditions per device:

Only the last measurement on each test should be considered -> that’s easy: filtering on is_last_measurement_on_test=true
For every device: If there is a bad result (ok=false) in any test where is_last_measurement_on_test=true, I want to display the first test on which the device failed.
For every device: If there is no bad result at all (ok=true) in any test where is_last_measurement_on_test=true, I want to display the last test on which the device passed.

For the given example above, I am expecting that only these rows to display:

  ('d1',3,'test3',true,true,0) 
  ('d2',2,'test2',true,false,100)
  ('d3',2,'test2',true,false,100)
  ('d4',4,'test3',true,true,0)

How can I receive this result? I already tried a lot on using first_value, for example

first_value(nullif(error_code,0)) over (partition by device_id)

but i wasn’t able to handle it in the way I wanted it to be.

Tags: postgresql sql

Answers

Having this sample data:

CREATE TABLE measurements (
  device_id text,
  measurement_no integer,
  test text,
  is_last_measurement_on_test boolean,
  ok boolean,
  error_code integer
);

INSERT INTO measurements (device_id, measurement_no, test, is_last_measurement_on_test, ok, error_code)
VALUES
  ('d1', 1, 'test1', true, true, 0),
  ('d1', 2, 'test2', true, true, 0),
  ('d1', 3, 'test3', true, true, 0),
  ('d2', 1, 'test1', true, true, 0),
  ('d2', 2, 'test2', true, false, 100),
  ('d2', 3, 'test3', true, true, 0),
  ('d3', 1, 'test1', true, true, 0),
  ('d3', 2, 'test2', true, false, 100),
  ('d3', 3, 'test3', true, false, 200),
  ('d4', 1, 'test1', true, true, 0),
  ('d4', 2, 'test2', false, false, 100),
  ('d4', 3, 'test2', true, true, 0),
  ('d4', 4, 'test3', true, true, 0);

It will be like:

WITH DataSource AS
(
  SELECT *
       ,MIN(CASE WHEN ok = false THEN test END) OVER (PARTITION BY device_id) AS first_failed_test
       ,ROW_NUMBER() OVER (PARTITION BY device_id ORDER BY test DESC) AS test_id
  FROM measurements
  WHERE is_last_measurement_on_test = true
)
SELECT device_id, measurement_no, test, is_last_measurement_on_test, ok, error_code
FROM DataSource
WHERE (first_failed_test IS NULL and test_id = 1)
    OR (first_failed_test = test)

The idea is get the name of the first fail test and to order the test using row_number starting from the latest one.

The important part is that here I am ordering the test by there name. In your real scenario, I am guessing you have an record_id or date, which can be used to do this. So, you will need to change a little bit the code.

- Zegarek
- February 7, 2023 at 12:40 pm
- 0 votes
0
distinct on gets you the one, "top" record per device_id.

order by lets you establish the order according to which a record might or might not end up on "top". Since your 2nd and 3rd case require opposite ordering/priority of tests per device_id:
1. earliest negative record when there are negatives
2. latest record if there are no negatives
You can flip that order accordingly, with a case.
```
select distinct on (device_id) * 
from measurements 
where is_last_measurement_on_test
order by device_id,   --Necessary for distinct on to return one row per device_id
         not ok desc, --If all is ok for a device_id, this does nothing. 
                      --Otherwise it'll put negative tests results first
         (case when not ok then -1 else 1 end)*measurement_no desc;
                      --When considering negative test results, it'll 
                      --put earliest first. Otherwise it'll put latest first.
```
It’s a common misconception that order by section is somehow restricted to plain column names or aliases. Meanwhile, quoting the doc, it gives you as much freedom as select section does:

The sort expression(s) can be any expression that would be valid in the query’s select list.

Online demo.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.