skip to Main Content

I have a representation of a (binary) tree as a table in PostgreSQL. Each row is a node of the tree and each row has a parent column and a color.

The nodes at he leaves of the tree have a color (their color column is not null and can be either green or red).

My goal is to color the whole tree. With these rules:

  • If a parent has one child, then the parent’s color is the child’s color.
  • If a parent has two children and both have the same color then the parent color is its children color.
  • If a parent has two children and they have different color, then the parent color is gray.

How can I write a recursive query in PostgreSQL for this algorithm?

Example of a tree:

CREATE TEMPORARY TABLE family (
  node text
, parent text
, color text
);

INSERT INTO family (parent, node, color)
VALUES
  (null , 'A'   , null)
, ('A'  , 'AB'  , null)
, ('A'  , 'AC'  , null)
, ('AB' , 'ABD' , null)
, ('ABD', 'ABDE', 'green')
, ('AC' , 'ACF' , null)
, ('AC' , 'ACG' , null)
, ('ACF', 'ACFH', 'red')
, ('ACF', 'ACFI', 'green')
, ('ACG', 'ACGJ', 'red')
, ('ACG', 'ACGK', 'red')
;

This tree, fully colored:

A colored tree

2

Answers


  1. You can use a recursive cte, aggregating after you perform the join on the parent node in a subquery:

    with recursive cte(parent, node, color) as (
       select t.parent, t.node, case when t.c1 = t.c2 then t.c1 else 'gray' end    
       from (select f1.parent, f1.node, max(f.color) c1, min(f.color) c2 
             from family f 
             join family f1 on f1.node = f.parent where f.color is not null 
             group by f1.parent, f1.node) t
       union all
       select t1.parent, t1.node, case when t1.c1 = t1.c2 then t1.c1 else 'gray' end  
       from (select t.parent, t.node, max(t.color) c1, min(t.color) c2 
             from (select f.parent, f.node, c.color from cte c 
                   join family f on f.node = c.parent) t 
             group by t.parent, t.node) t1
    )
    select c.parent, c.node, c.color from cte c
    union all
    select f.parent, f.node, f.color from family f where f.color is not null
    

    See fiddle.

    Login or Signup to reply.
  2. A recursive CTE does not allow aggregation in the recursive term.

    I suggest to update the original table one level at a time, top-down:

    DO
    $do$
    DECLARE
       _max_depth int := (SELECT max(length(node)) - 1 FROM family);
       _depth int;
    BEGIN
       FOR _depth IN REVERSE _max_depth .. 1 LOOP
          UPDATE family p
          SET    color = c.color
          FROM  (
             SELECT parent, CASE WHEN min(color) = max(color) THEN min(color) ELSE 'gray' END AS color
             FROM   family
             WHERE  length(node) = _depth + 1
             GROUP  BY 1
             ) c
          WHERE  p.node = c.parent
          AND    p.color IS DISTINCT FROM c.color;  -- optional
       END LOOP;
    END
    $do$;
    

    fiddle

    If you don’t want to touch the original table, operate on a copy of the table, maybe a TEMPORARY table for best performance.

    I use length(node) as indicator of the nesting level. If that’s unreliable, replace it with a reliable nesting level. Either way, if the table is big and there are many levels, an index on the level indicator will help performance. For the demo that would be:

    CREATE INDEX family_level_idx ON family (length(node));
    

    The added WHERE clause p.color IS DISTINCT FROM c.color skips empty updates. See:

    Related:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search