Need to figure out an efficient way to query a table using another table as a filter/config (postgres 14.5).
The filter table has 4 levels that match 4 levels in the data. Each level can be read as a RegExp with wildcard, where wildcard is null. But the rules are MUTUALLY EXCLUSIVE. meaning rule aa-* excludes the rule aa-ab-* and vice versa.
e.g. Filter Table
ID | Category | Level 1 | Level 2 | Level 3 | Level 4 |
---|---|---|---|---|---|
Rule1 | A | aa | null | null | null |
Rule2 | A | aa | ab | null | null |
Rule3 | A | ab | null | null | null |
Rule4 | A | ab | ac | aa | null |
In this case filter rule #1 is matching all aa-* data, except when it’s aa-ab-* (rule #2)
Similarly, rule #3 will match ab-* data, except ab-ac-aa-*
e.g. Data Table
Data | Category | Level 1 | Level 2 | Level 3 | Level 4 |
---|---|---|---|---|---|
Data1 | A | aa | aa | ac | aa |
Data2 | A | aa | aa | null | null |
Data3 | A | aa | ab | null | null |
Data4 | A | ab | ab | null | null |
Data5 | A | ab | ac | null | null |
Data6 | A | ab | ac | dd | null |
This way the join between the two should produce result like:
Data ID | Rule ID |
---|---|
Data1 | Rule1 |
Data2 | Rule1 |
Data3 | Rule2 |
Data4 | Rule3 |
Data5 | Rule3 |
Data6 | Rule3 |
Approach toward combining all levels into a string with "level-combo" seem to be inefficient and has gaps. The join based on coalesce(lvl, ”) is not helpful either, it’s not addressing the wildcard rules.
where coalesce(a.lvl1,'') = coalesce(c.lvl1,'')
and coalesce(a.lvl2,'') = coalesce(c.lvl2,'')
and coalesce(a.lvl3,'') = coalesce(c.lvl3,'')
and coalesce(a.lvl4,'') = coalesce(c.lvl4,'')
DATA to replicate:
WITH config (id, category, lvl1, lvl2, lvl3, lvl4) AS (
VALUES
(1, 's', null, null, null, null ),
(2, 's', 'u7', null, null, null ),
(3, 's', 'u6', 'u1', null, null ),
(4, 's', 'u5', 'ud', 'u2', null ),
(5, 's', 'u5', 'ud', 'u3', null ),
(6, 's', 'u5', 'ud', 'u4', 'ok' ),
(9, 's', 'u4', null, null, null ),
(7, 's', 'u4', 'u1', 'u2', 'u3' ),
(8, 's', 'u4', 'cu', 'u2', null )
),
datum (id, data_id, internal_id, start_date, end_date, category, lvl1, lvl2, lvl3, lvl4) AS (
VALUES
(1, 'x1', '111', '2022-01-01', '2022-12-01', 's', null, null, null, null ),
(2, 'x2', '112', '2022-01-01', '2022-12-01', 's', 'u7', null, null, null ),
(3, 'x3', '113', '2022-01-01', '2022-12-01', 's', 'u6', 'u1', null, null ),
(4, 'x4', '114', '2022-01-01', '2022-12-01', 's', 'u5', 'ud', 'u2', null ),
(5, 'x5', '115', '2022-01-01', '2022-12-01', 's', 'u5', 'ud', 'u3', null ),
(6, 'x6', '116', '2022-01-01', '2022-12-01', 's', 'u5', 'ud', 'u4', 'ok' ),
(9, 'x9', '119', '2022-01-01', '2022-12-01', 's', 'u4', null, null, null ),
(7, 'x7', '117', '2022-01-01', '2022-12-01', 's', 'u4', 'u1', 'u2', 'u3' ),
(8, 'x8', '118', '2022-01-01', '2022-12-01', 's', 'u4', 'cu', 'u2', null ),
(9, 'x2', '112', '2022-01-01', '2022-12-01', 's', 'u9', null, null, null ),
(10, 'x3', '113', '2022-01-01', '2022-12-01', 's', 'u5', 'u1', null, null ),
(11, 'x4', '114', '2022-01-01', '2022-12-01', 's', 'u5', 'dd', 'u2', null ),
(12, 'x5', '115', '2022-01-01', '2022-12-01', 's', 'u5', 'ud', 'u3', 'ck' ),
(13, 'x6', '116', '2022-01-01', '2022-12-01', 's', 'u5', 'ud', 'u4', 'no' )
)
SELECT
*
FROM config c
join datum d on c.category = d.category
and coalesce(c.lvl1, '') = coalesce(d.lvl1, '')
and ... ;
2
Answers
String concatenation maybe? (Note this is SSMS syntax, I don’t know postgresql)
This will return all matches, so you’ll also need to order your rules by priority, group the matches by dataID and return the minimum rule.
This will only work if id is unique.
your second datum table has a double 9
This will sum up all hits between both tables and the highest score for a c id wins
fiddle