skip to Main Content

I have some rules which gets fired if a criteria for a process gets fulfilled. I am calculating the performance of those rules in a confusion matrix form.

If the rule was fired, it means the process was predicted positive. So if the process was actually positive that’ll be a True Positive scenario
If the rule was fired, it means the process was predicted positive. So if the process was not actually positive that’ll be a False Positive scenario
If the rule_name is NULL, it means none of the rule was fired, So if the process was actually negative that’s a False Negative scenario.

Note: if a process is null in the rule_name, that’s a false negative for every rule because none of the rule got fired while the process actually failed.

id rule_name true_label predicted_label
1 rule1 positive positive
2 rule1 positive positive
3 rule1 negative positive
4 null positive positive
5 null positive positive
6 null negative positive
7 rule2 positive positive
8 rule2 negative positive
9 rule2 negative positive

My result set should look like:

True Positive False Positive True Negative False Negative
rule1 1 1 1 2
rule2 1 2 1 2

I am doing this in SQL. I can do this by filtering each rule_name i.e.


select 'rule1' as rule
    , count(case when rule_name = 'rule1' and true_label = 'positive' and predicted_label = 'positive' then 1 else null  end)as tp_count
    , count(case when rule_name = 'rule1' and true_label != 'positive' and predicted_label = 'positive' then 1 else null  end)as fp_count
    , count(case when (rule_name != 'rule1' or rule_name is null) and and true_label != 'positive' then 1 else null end) as tn_count
from table
;

I have multiple such rules so, I don’t want to do this calculation each rule with union all I am looking for a better way to do this calculation. My main challenge is to count the True Negatives where I need to count the rule_name NULLs as well.

2

Answers


  1. To solve this problem in a more dynamic and efficient way in SQL (specifically PostgreSQL), you can use a combination of GROUP BY, CASE, and window functions. Here’s a solution that avoids manually filtering each rule and dynamically handles all rules in a single query.

    SQL Query:

    WITH rule_performance AS (
    SELECT
        COALESCE(rule_name, 'no_rule') AS rule_name,
        COUNT(CASE WHEN true_label = 'positive' AND predicted_label = 'positive' THEN 1 END) AS tp_count,
        COUNT(CASE WHEN true_label != 'positive' AND predicted_label = 'positive' THEN 1 END) AS fp_count,
        COUNT(CASE WHEN (rule_name != r.rule_name OR rule_name IS NULL) AND true_label != 'positive' THEN 1 END) AS tn_count,
        COUNT(CASE WHEN rule_name IS NULL AND true_label = 'positive' THEN 1 END) AS fn_count
    FROM
        your_table t
    LEFT JOIN 
        (SELECT DISTINCT rule_name FROM your_table WHERE rule_name IS NOT NULL) r 
    ON 
        t.rule_name = r.rule_name OR t.rule_name IS NULL
    GROUP BY
        COALESCE(rule_name, 'no_rule')
    

    )
    SELECT
    rule_name,
    tp_count,
    fp_count,
    tn_count,
    fn_count
    FROM
    rule_performance
    ORDER BY
    rule_name;

    Now try this code.

    Login or Signup to reply.
  2. It looks like you simply want to group by the rule name, so as to get one result row per rule:

    select 
      rule_name,
      count(*) filter (where true_label = 'positive' and predicted_label = 'positive') as tp_count,
      count(*) filter (where true_label = 'negative' and predicted_label = 'positive') as fp_count,
      count(*) filter (where true_label = 'negative' and predicted_label = 'negative') as tn_count,
      count(*) filter (where true_label = 'positive' and predicted_label = 'negative') as fn_count
    from table
    group by rule_name
    order by rule_name;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search