skip to Main Content

I want to display the count of hosts/servers that do not have a given package installed (example: "abc") in a Grafana dashboard with Postgres as data source.

Example table of a single server with "abc" package present which is displayed in Grafana:

Hostname Package version Package_installed_on_server
server1 abc 10.0.1 true
server1 abc 10.0.2 false
server1 package2 3.1 true
server1 package3 4.1.1 true
server2 package2 3.1 true
server2 package3 4.1.1 true
server2 package4 10.0.1 true
Server3 package2 3.1 true
Server3 package3 4.1.1 true

For example, I have 10,000 servers and in 400 servers the package "abc" is not installed. Then Grafana should display the count of servers where no version of "abc" is present.

This is what I tried. With the above example the output should be 2 where package abc is not present. But I am not getting the output as expected (or 2).

SELECT count(*) 
FROM (
     SELECT DISTINCT hostname     
     FROM scanner_table     
     GROUP BY hostname     
     HAVING COUNT(CASE WHEN package = 'abc' THEN 1 END) = 0 
) AS host_without_abc;

How to solve this?

2

Answers


  1. Simple, naive, slow, first implementation – count all hosts – count all hosts with abc package = hosts without abc package:

    SELECT COUNT(*)
    FROM (
      (
        SELECT DISTINCT hostname  
        FROM scanner_table
      ) 
      EXCEPT
      (
        SELECT DISTINCT hostname  
        FROM scanner_table     
        WHERE package = 'abc'
      ) 
    )
    

    More mature solution: left join for the same hostname with selected package and count hostnames, where join is empty/null:

    SELECT COUNT(DISTINCT a.hostname) 
    FROM scanner_table AS a
    LEFT JOIN scanner_table AS b
      ON a.hostname=b.hostname AND b.package = 'abc'
    WHERE b.hostname IS NULL
    
    Login or Signup to reply.
  2. Depending on undisclosed details, one or the other query is simpler / faster.

    Setup

    CREATE TABLE scanner_table (
      hostname text NOT NULL  -- !
    , package text NOT NULL  -- !
    , version text
    , package_installed_on_server bool
    );
    
    INSERT INTO scanner_table VALUES
      ('server1', 'abc'     , '10.0.1', true)
    , ('server1', 'abc'     , '10.0.2', false)
    , ('server1', 'package2', '3.1'   , true)
    , ('server1', 'package3', '4.1.1' , true)
    , ('server2', 'package2', '3.1'   , true)
    , ('server2', 'package3', '4.1.1' , true)
    , ('server2', 'package4', '10.0.1', true)
    , ('Server3', 'package2', '3.1'   , true)
    , ('Server3', 'package3', '4.1.1' , true)
    ;
    

    This multicolumn index helps all below queries (if table is vacuum’ed enough), some more some less;

    CREATE INDEX scanner_table_hostname_package_idx ON scanner_table (hostname, package);
    

    You should have a hostname table with one row per relevant host.
    Consider creating one if you don’t:

    CREATE TABLE hostname (hostname text PRIMARY KEY);
    INSERT INTO hostname VALUES
      ('server1')
    , ('server2')
    , ('Server3')
    ;
    

    Queries

    Fastest if table hostname exists. (FASTEST overall.)
    See:

    SELECT count(*)
    FROM  hostname h
    WHERE NOT EXISTS (
       SELECT FROM scanner_table s
       WHERE  s.hostname = h.hostname
       AND    s.package = 'abc'
       );
    

    Fastest without table hostname and only few rows per hostname.
    Similar to what you tried (which should actually work, inefficiently).

    SELECT count(*)
    FROM  (
       SELECT FROM scanner_table  -- SELECT list can stay empty
       GROUP  BY hostname
       HAVING bool_and(package <> 'abc') -- assuming column package defined NOT NULL
       ) sub;
    

    Emulating an index skip scan will be fastest without table hostname and many rows per hostname.
    Needs the index to be fast. See:

    WITH RECURSIVE cte AS (
       (
       SELECT hostname, package
       FROM   scanner_table s
       ORDER  BY hostname, package <> 'abc'
       LIMIT  1
       )
    
       UNION ALL
       (
       SELECT s.hostname, s.package
       FROM   cte c
       JOIN   scanner_table s ON s.hostname > c.hostname
       ORDER  BY s.hostname, s.package <> 'abc'
       LIMIT  1
       )
       )
    SELECT count(*)
    FROM   cte
    WHERE  package <> 'abc';
    

    fiddle

    The expression package <> 'abc' in ORDER BY sorts the package ‘abc’ on top. Why?

    If column package can be NULL, you need to do more for some queries.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search