I have some rules which gets fired if a criteria for a process gets fulfilled. I am calculating the performance of those rules in a confusion matrix form.
If the rule was fired, it means the process was predicted positive. So if the process was actually positive that’ll be a True Positive scenario
If the rule was fired, it means the process was predicted positive. So if the process was not actually positive that’ll be a False Positive scenario
If the rule_name is NULL, it means none of the rule was fired, So if the process was actually negative that’s a False Negative scenario.
Note: if a process is null in the rule_name, that’s a false negative for every rule because none of the rule got fired while the process actually failed.
id | rule_name | true_label | predicted_label |
---|---|---|---|
1 | rule1 | positive | positive |
2 | rule1 | positive | positive |
3 | rule1 | negative | positive |
4 | null | positive | positive |
5 | null | positive | positive |
6 | null | negative | positive |
7 | rule2 | positive | positive |
8 | rule2 | negative | positive |
9 | rule2 | negative | positive |
My result set should look like:
True Positive | False Positive | True Negative | False Negative | |
---|---|---|---|---|
rule1 | 1 | 1 | 1 | 2 |
rule2 | 1 | 2 | 1 | 2 |
I am doing this in SQL. I can do this by filtering each rule_name i.e.
select 'rule1' as rule
, count(case when rule_name = 'rule1' and true_label = 'positive' and predicted_label = 'positive' then 1 else null end)as tp_count
, count(case when rule_name = 'rule1' and true_label != 'positive' and predicted_label = 'positive' then 1 else null end)as fp_count
, count(case when (rule_name != 'rule1' or rule_name is null) and and true_label != 'positive' then 1 else null end) as tn_count
from table
;
I have multiple such rules so, I don’t want to do this calculation each rule with union all
I am looking for a better way to do this calculation. My main challenge is to count the True Negatives where I need to count the rule_name
NULLs as well.
2
Answers
To solve this problem in a more dynamic and efficient way in SQL (specifically PostgreSQL), you can use a combination of GROUP BY, CASE, and window functions. Here’s a solution that avoids manually filtering each rule and dynamically handles all rules in a single query.
SQL Query:
)
SELECT
rule_name,
tp_count,
fp_count,
tn_count,
fn_count
FROM
rule_performance
ORDER BY
rule_name;
Now try this code.
It looks like you simply want to group by the rule name, so as to get one result row per rule: