(I had originally asked this question on the DBA StackExchange, but didn’t get much activity in it).
In my DB, I have tables for items
and connections
between them, creating a likely sparse graph, with islands:
CREATE TABLE IF NOT EXISTS items (
--------------------------------------------------------
id UUID NOT NULL DEFAULT uuid_generate_v4(),
--------------------------------------------------------
title VARCHAR(300) NOT NULL,
--------------------------------------------------------
CONSTRAINT items_pk
PRIMARY KEY (id)
--------------------------------------------------------
);
CREATE TABLE IF NOT EXISTS connections (
---------------------------------------------------------------------
id UUID NOT NULL DEFAULT uuid_generate_v4(),
---------------------------------------------------------------------
origin_item_id UUID NOT NULL,
destination_item_id UUID NOT NULL,
---------------------------------------------------------------------
title VARCHAR(300) NOT NULL,
---------------------------------------------------------------------
CONSTRAINT origin_item_fk
FOREIGN KEY (origin_item_id)
REFERENCES items (id),
CONSTRAINT destination_item_fk
FOREIGN KEY (destination_item_id)
REFERENCES items (id),
CONSTRAINT connections_pk
PRIMARY KEY (id)
---------------------------------------------------------------------
);
I’m using node-pg
, and I would like to do a breadth traversal from an inital seed (e.g. url parameter on ExpressJS), up to a certain depth (from what I see in the docs, the depth part is probably the easiest).
Ideally, I would like things to be returned in a nested way, but, from reading these docs (WITH
queries), I couldn’t figure out how, or if it is even possible.
INSERT INTO items (title)
VALUES ('Software'), ('Pyongyang'), ('Burma'), ('Shenzhen'), ('Jerusalem'), ('Hostage');
INSERT INTO connections (origin_item_id, destination_item_id, title)
VALUES
(
(SELECT id FROM items WHERE title ~ 'Pyongyang'),
(SELECT id FROM items WHERE title ~ 'Shenzhen'),
'Same Author - Guy Delisle - 1'
),
(
(SELECT id FROM items WHERE title ~ 'Burma'),
(SELECT id FROM items WHERE title ~ 'Pyongyang'),
'Same Author - Guy Delisle - 2'
),
(
(SELECT id FROM items WHERE title ~ 'Jerusalem'),
(SELECT id FROM items WHERE title ~ 'Shenzhen'),
'Same Author - Guy Delisle - 3'
),
(
(SELECT id FROM items WHERE title ~ 'Hostage'),
(SELECT id FROM items WHERE title ~ 'Jerusalem'),
'Same Author - Guy Delisle - 4'
),
The recursive query would then result in, for Pyongyang
and maxDepth = 2
— note neither Hostage nor Software should appear then —:
{
"title": "Pyongyang",
"connections": [
{
"title": "Same Author - Guy Delisle - 1",
"destination_item": {
"title": "Shenzhen",
"connections": [
{
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Jerusalem"
}
}
]
}
},
{
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Burma"
}
}
]
}
(I’m not 100% sure if I’ve covered all the cases with this example.)
Is this type of nesting even possible in PostgreSQL (with, say, jsonb_build_object
or jsonb_agg
)? How would I do it?
Here is what I’ve been able to do:
WITH RECURSIVE connections_and_items AS (
SELECT
c.id AS connection_id,
c.title AS connection_title,
c.origin_item_id,
i.title AS origin_item_title,
c.destination_item_id,
it.title AS destination_item_title
FROM connections c
JOIN items i
ON i.id = c.origin_item_id
JOIN items it
ON it.id = c.destination_item_id
),
connected_items AS (
SELECT *
FROM connections_and_items
WHERE origin_item_id = '${itemId}'::UUID
UNION
SELECT c.*
FROM connections_and_items c
JOIN connected_items ci
ON ci.destination_item_id = c.origin_item_id
OR ci.origin_item_id = c.destination_item_id
)
SELECT
ci.connection_id,
ci.connection_title,
jsonb_build_object(
'id', ci.origin_item_id,
'title', ci.origin_item_title
) origin_item,
jsonb_build_object(
'id', ci.destination_item_id,
'title', ci.destination_item_title
) destination_item
FROM connected_items ci
This actually does traverse the graph successfully, in both directions (!). However, I haven’t been able to add a maxDepth
constraint (having trouble detecting cycles), and the nesting is still not there.
So, in summary, here’s what I have and what I’m missing:
- ✔ Recursive Breadth Traversal
- ✗ Max Depth
- ✗ Nested JSON
- ✔ Filtering the Initial Seed
2
Answers
For a maximum depth restriction try the following logic (i.e. DIY level counter)
e.g:
and (untested) this may help on the nested JSON:
You said "(I’m not 100% sure if I’ve covered all the cases with this example.)": and you are right, the JSON format you try to achieve is ambiguous when an item is origin or destination of several connections.
Your recursive query can be the basis of a JSON like the one below: the main difference being that there is no mixing of nodes item/connection in the hierarchy traversal, which is the main difficulty of the format you try to achieve where you go from item (while your query is connection-driven) then go the connections and then back to items, and furthermore you have 2 hierarchy: the "from" and the "to" ones…
You recursive query leads easily to the following kind of JSON (I have the code for ORACLE if you are interested: mainly after the recursive query, you need another CTE to calculate LAG and LEAD levels in the hierarchy + the JSON of the item node, and the final query will add the "children" node and use the lead/lag infos to close the hierarchy correctly, e.g. the correct number of "]}" required to close the "children" node),