skip to Main Content

(I had originally asked this question on the DBA StackExchange, but didn’t get much activity in it).

In my DB, I have tables for items and connections between them, creating a likely sparse graph, with islands:

CREATE TABLE IF NOT EXISTS items (
    --------------------------------------------------------
    id    UUID         NOT NULL DEFAULT uuid_generate_v4(),
    --------------------------------------------------------
    title VARCHAR(300) NOT NULL,
    --------------------------------------------------------
    CONSTRAINT     items_pk
       PRIMARY KEY (id)
    --------------------------------------------------------
);

CREATE TABLE IF NOT EXISTS connections (
    ---------------------------------------------------------------------
    id                  UUID         NOT NULL DEFAULT uuid_generate_v4(),
    ---------------------------------------------------------------------
    origin_item_id      UUID         NOT NULL,
    destination_item_id UUID         NOT NULL,
    ---------------------------------------------------------------------
    title               VARCHAR(300) NOT NULL,
    ---------------------------------------------------------------------
    CONSTRAINT     origin_item_fk
       FOREIGN KEY (origin_item_id)
        REFERENCES items (id),
    CONSTRAINT     destination_item_fk
       FOREIGN KEY (destination_item_id)
        REFERENCES items (id),
    CONSTRAINT     connections_pk
       PRIMARY KEY (id)
    ---------------------------------------------------------------------
);

I’m using node-pg, and I would like to do a breadth traversal from an inital seed (e.g. url parameter on ExpressJS), up to a certain depth (from what I see in the docs, the depth part is probably the easiest).

Ideally, I would like things to be returned in a nested way, but, from reading these docs (WITH queries), I couldn’t figure out how, or if it is even possible.

INSERT INTO items (title)
VALUES ('Software'), ('Pyongyang'), ('Burma'), ('Shenzhen'), ('Jerusalem'), ('Hostage');

INSERT INTO connections (origin_item_id, destination_item_id, title)
VALUES
    (
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 1'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Burma'),
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        'Same Author - Guy Delisle - 2'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 3'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Hostage'),
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        'Same Author - Guy Delisle - 4'
    ),

The recursive query would then result in, for Pyongyang and maxDepth = 2 — note neither Hostage nor Software should appear then —:

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzhen",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 2",
            "origin_item": {
              "title": "Jerusalem"
            }
          }
        ]
      }
    },
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma"
      }
    }
  ]
}

(I’m not 100% sure if I’ve covered all the cases with this example.)

Is this type of nesting even possible in PostgreSQL (with, say, jsonb_build_object or jsonb_agg)? How would I do it?

Here is what I’ve been able to do:

WITH RECURSIVE connections_and_items AS (
    SELECT
         c.id AS connection_id,
         c.title AS connection_title,
         c.origin_item_id,
         i.title AS origin_item_title,
         c.destination_item_id,
        it.title AS destination_item_title

    FROM connections c
    JOIN items       i
      ON  i.id = c.origin_item_id
    JOIN items       it
      ON it.id = c.destination_item_id
),
connected_items AS (
    SELECT *
    FROM connections_and_items
    WHERE origin_item_id = '${itemId}'::UUID

    UNION 
        
    SELECT c.*
    FROM connections_and_items c
    JOIN connected_items       ci
      ON ci.destination_item_id = c.origin_item_id
      OR ci.origin_item_id      = c.destination_item_id
) 

SELECT 
    ci.connection_id, 
    ci.connection_title,
    jsonb_build_object(
        'id',    ci.origin_item_id,
        'title', ci.origin_item_title
    ) origin_item,
    jsonb_build_object(
        'id',    ci.destination_item_id,
        'title', ci.destination_item_title
    ) destination_item

FROM connected_items ci

This actually does traverse the graph successfully, in both directions (!). However, I haven’t been able to add a maxDepth constraint (having trouble detecting cycles), and the nesting is still not there.

So, in summary, here’s what I have and what I’m missing:

  • ✔ Recursive Breadth Traversal
  • ✗ Max Depth
  • ✗ Nested JSON
  • ✔ Filtering the Initial Seed

References

2

Answers


  1. For a maximum depth restriction try the following logic (i.e. DIY level counter)

    WITH RECURSIVE t(depth) AS (
        SELECT 1
        UNION ALL
        SELECT depth+1 FROM t WHERE depth < 10 -- specify max depth here
    )
    SELECT n FROM t;
    

    e.g:

    WITH RECURSIVE connections_and_items AS (
        SELECT
             c.id AS connection_id,
             c.title AS connection_title,
             c.origin_item_id,
             i.title AS origin_item_title,
             c.destination_item_id,
             it.title AS destination_item_title,
             1 AS depth,
             c.origin_item_id AS parent_id
        FROM connections c
        JOIN items       i
          ON  i.id = c.origin_item_id
        JOIN items       it
          ON it.id = c.destination_item_id
        WHERE c.origin_item_id = '${itemId}'::UUID
    
        UNION 
            
        SELECT 
            c.connection_id,
            c.connection_title,
            c.origin_item_id,
            c.origin_item_title,
            c.destination_item_id,
            c.destination_item_title,
            ci.depth + 1 AS depth,
            ci.connection_id AS parent_id
        FROM connections_and_items c
        JOIN connected_items       ci
          ON ci.destination_item_id = c.origin_item_id
          OR ci.origin_item_id      = c.destination_item_id
        WHERE ci.depth < ${maxDepth}
    )
    

    and (untested) this may help on the nested JSON:

    SELECT 
        ci.connection_id, 
        json_build_object(
            'title', ci.connection_title,
            'origin_item', json_build_object(
                'id',    ci.origin_item_id,
                'title', ci.origin_item_title
            ),
            'destination_item', json_build_object(
                'id',    ci.destination_item_id,
                'title', ci.destination_item_title
            ),
            'children', json_agg(ci) FILTER (WHERE ci.parent_id = c.connection_id)
        ) connection
    FROM connected_items ci
    GROUP BY ci.connection_id,
        ci.connection_title,
        ci.origin_item_id,
        ci.origin_item_title,
        ci.destination_item_id,
        ci.destination_item_title
    
    Login or Signup to reply.
  2. You said "(I’m not 100% sure if I’ve covered all the cases with this example.)": and you are right, the JSON format you try to achieve is ambiguous when an item is origin or destination of several connections.
    Your recursive query can be the basis of a JSON like the one below: the main difference being that there is no mixing of nodes item/connection in the hierarchy traversal, which is the main difficulty of the format you try to achieve where you go from item (while your query is connection-driven) then go the connections and then back to items, and furthermore you have 2 hierarchy: the "from" and the "to" ones…

    You recursive query leads easily to the following kind of JSON (I have the code for ORACLE if you are interested: mainly after the recursive query, you need another CTE to calculate LAG and LEAD levels in the hierarchy + the JSON of the item node, and the final query will add the "children" node and use the lead/lag infos to close the hierarchy correctly, e.g. the correct number of "]}" required to close the "children" node),

    [
      {
        "conn_title" : "Same Author - Guy Delisle - 1",
        "org_title" : "Pyongyang",
        "dest_title" : "Shenzhen",
        "children" :
        [
          {
            "conn_title" : "Same Author - Guy Delisle - 2",
            "org_title" : "Burma",
            "dest_title" : "Pyongyang",
            "children" :
            [
              {
                "conn_title" : "Same Author - Guy Delisle - 1",
                "org_title" : "Pyongyang",
                "dest_title" : "Shenzhen"
              }
            ]
          },
          {
            "conn_title" : "Same Author - Guy Delisle - 3",
            "org_title" : "Jerusalem",
            "dest_title" : "Shenzhen",
            "children" :
            [
              {
                "conn_title" : "Same Author - Guy Delisle - 1",
                "org_title" : "Pyongyang",
                "dest_title" : "Shenzhen"
              },
              {
                "conn_title" : "Same Author - Guy Delisle - 4",
                "org_title" : "Hostage",
                "dest_title" : "Jerusalem",
                "children" :
                [
                  {
                    "conn_title" : "Same Author - Guy Delisle - 3",
                    "org_title" : "Jerusalem",
                    "dest_title" : "Shenzhen"
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search