Postgresql - How to only show duplicates whose amount is equal to the number of distinct entries in a different column... Is it possible?

kreaturen
June 13, 2024
114 views
0 votes
2 Answers

I want to query a table for where the number of identical values in one column is equal to the number distinct values in another column. I’m not sure how to phrase that better, so I think it best to illustrate:

Tables

Left one represent the table I have, and the right represents what I want it reduced to. ‘Time’ column is timestamps, and ‘parts’ are the name of the devices that are providing the values on each timestamp. In this example there are only 4 distinct parts: A, B, C and D (but there could be more, there could be less).

Every part is supposed to always report a value on a timestamp, but sometimes they don’t all do it. I only want to keep the rows where all have reported. When that occur, the number of identical timestamps equals number of unique parts. In this example, that occurs only at t2, t4, t8 and t10.

I know how to get the query to recognize that there are 4 distinct parts, but I don’t know how to recognize that there are four t2-s, four t4-s and so on… and then compare that info and make a condition to only give me those. Would it be possible?

So far:

I’ve only created the table at fiddle, but as for an actual query I’m stumped.
I’m currently trying to store the number of distinct parts as a variable (to be used for setting up a condition later), but struggling with it. I’ve seen some PL/pgSQL methods for this, but I can’t seem to get them to work. I’m unsure how to use/"place them" alongside a standard postgres query. I might try using a CTE as "variable" instead…
As for the problem of comparing the number of distinct to multiple groups of duplicates within another column, I have no ideas – Not even what such a "maneuver" would be called…

(The table will later be pivoted into a proper (if though patchy) time series table, but before that happens I’d like the pre-table "cleaned"… Time-series is therefore not really within the scope of this current issue)

Tags: plpgsql postgresql

Answers

- Bergi
- June 13, 2024 at 2:18 pm
- 0 votes
0
No PL/pgSQL and no CTEs are needed for this. To count the number of distinct parts per timestamp, just use a GROUP BY clause:
```
SELECT *
FROM testTable
WHERE test_time IN (
  SELECT test_time
  FROM testTable
  GROUP BY test_time
  HAVING COUNT(DISTINCT test_part) = (SELECT COUNT(DISTINCT test_part) FROM testTable)
);
```
^{(online demo)}
Login or Signup to reply.

- ValNik
- June 13, 2024 at 6:16 pm
- 0 votes
0
See example
```
select *
from (
select *
  ,max(tn)over(partition by test_time) maxN
  ,max(tn)over() maxSensorCnt
from(
select *
  ,dense_rank()over(partition by test_time order by test_part) tn
from testTable
)tr
)tm
where maxn=maxsensorcnt
```
Login or Signup to reply.