Postgresql - Confusion Matrix in SQL

v_kumar
August 27, 2024
191 views
0 votes
2 Answers

I have some rules which gets fired if a criteria for a process gets fulfilled. I am calculating the performance of those rules in a confusion matrix form.

If the rule was fired, it means the process was predicted positive. So if the process was actually positive that’ll be a True Positive scenario
If the rule was fired, it means the process was predicted positive. So if the process was not actually positive that’ll be a False Positive scenario
If the rule_name is NULL, it means none of the rule was fired, So if the process was actually negative that’s a False Negative scenario.

Note: if a process is null in the rule_name, that’s a false negative for every rule because none of the rule got fired while the process actually failed.

id	rule_name	true_label	predicted_label
1	rule1	positive	positive
2	rule1	positive	positive
3	rule1	negative	positive
4	null	positive	positive
5	null	positive	positive
6	null	negative	positive
7	rule2	positive	positive
8	rule2	negative	positive
9	rule2	negative	positive

My result set should look like:

	True Positive	False Positive	True Negative	False Negative
rule1	1	1	1	2
rule2	1	2	1	2

I am doing this in SQL. I can do this by filtering each rule_name i.e.


select 'rule1' as rule
    , count(case when rule_name = 'rule1' and true_label = 'positive' and predicted_label = 'positive' then 1 else null  end)as tp_count
    , count(case when rule_name = 'rule1' and true_label != 'positive' and predicted_label = 'positive' then 1 else null  end)as fp_count
    , count(case when (rule_name != 'rule1' or rule_name is null) and and true_label != 'positive' then 1 else null end) as tn_count
from table
;

I have multiple such rules so, I don’t want to do this calculation each rule with union all I am looking for a better way to do this calculation. My main challenge is to count the True Negatives where I need to count the rule_name NULLs as well.

Answers

To solve this problem in a more dynamic and efficient way in SQL (specifically PostgreSQL), you can use a combination of GROUP BY, CASE, and window functions. Here’s a solution that avoids manually filtering each rule and dynamically handles all rules in a single query.

SQL Query:

WITH rule_performance AS (
SELECT
    COALESCE(rule_name, 'no_rule') AS rule_name,
    COUNT(CASE WHEN true_label = 'positive' AND predicted_label = 'positive' THEN 1 END) AS tp_count,
    COUNT(CASE WHEN true_label != 'positive' AND predicted_label = 'positive' THEN 1 END) AS fp_count,
    COUNT(CASE WHEN (rule_name != r.rule_name OR rule_name IS NULL) AND true_label != 'positive' THEN 1 END) AS tn_count,
    COUNT(CASE WHEN rule_name IS NULL AND true_label = 'positive' THEN 1 END) AS fn_count
FROM
    your_table t
LEFT JOIN 
    (SELECT DISTINCT rule_name FROM your_table WHERE rule_name IS NOT NULL) r 
ON 
    t.rule_name = r.rule_name OR t.rule_name IS NULL
GROUP BY
    COALESCE(rule_name, 'no_rule')

)
SELECT
rule_name,
tp_count,
fp_count,
tn_count,
fn_count
FROM
rule_performance
ORDER BY
rule_name;

Now try this code.

It looks like you simply want to group by the rule name, so as to get one result row per rule:

select 
  rule_name,
  count(*) filter (where true_label = 'positive' and predicted_label = 'positive') as tp_count,
  count(*) filter (where true_label = 'negative' and predicted_label = 'positive') as fp_count,
  count(*) filter (where true_label = 'negative' and predicted_label = 'negative') as tn_count,
  count(*) filter (where true_label = 'positive' and predicted_label = 'negative') as fn_count
from table
group by rule_name
order by rule_name;

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Confusion Matrix in SQL

Answers