There is a set of users. A person can have multiple users, but ref1
and ref2
might be alike and can therefore link users together. ref1
and ref2
does not overlap, one value in ref1
does not exist in ref2
.
A user can own multiple assets. I want to "merge" users that has one or more refs alike and then count how many assets they own together. There could be missing entries in the user table, in that case I just want to propagate the owner into ref2 and set the asset_count and asset_ids.
Here is an example schema to illustrate:
Example assets
SELECT * FROM assets;
id | name | owner |
---|---|---|
1 | #1 | a |
2 | #2 | b |
3 | #3 | c |
4 | #4 | a |
5 | #5 | c |
6 | #6 | d |
7 | #7 | e |
8 | #8 | d |
9 | #9 | a |
10 | #10 | a |
11 | #11 | z |
Example users
SELECT * FROM users;
id | username | ref1 | ref2 |
---|---|---|---|
1 | bobo | a | d |
2 | toto | b | e |
3 | momo | c | d |
4 | lolo | a | f |
5 | popo | c | f |
What I want to get in the end
SELECT * FROM results;
ids | usernames | refs1 | refs2 | asset_ids | asset_count |
---|---|---|---|---|---|
1,3,4,5 | bobo,momo,lolo,popo | a,c | d,f | 1,3,4,5,6,8,9,10 | 8 |
2 | toto | b | e | 2,7 | 2 |
z | 11 | 1 |
I’ve tried different approaches, but this is what I currently have:
Closest I have got
SELECT
ARRAY_AGG(DISTINCT u.id) AS ids,
ARRAY_AGG(DISTINCT u.username) AS usernames,
ARRAY_AGG(DISTINCT u.ref1) AS refs1,
ARRAY_AGG(DISTINCT u.ref2) AS refs2,
COUNT(DISTINCT a.id) AS asset_count
FROM assets a
JOIN users u ON a.owner = u.ref1 OR a.owner = u.ref2
GROUP BY a.owner
ORDER BY MIN(a.id);
ids | usernames | refs1 | refs2 | asset_count |
---|---|---|---|---|
1,4 | bobo,lolo | a | d,f | 4 |
2 | toto | b | e | 1 |
3,5 | momo,popo | c | d,f | 2 |
1,3 | bobo,momo | a,c | d | 2 |
2 | toto | b | e | 1 |
If I merge the above table on ids, I almost get the result I want (without the missing entries in the user table). The merging can easily be done in code, but then I cannot paginate etc. I want to to this in DB layer if possible.
I want either a solution to the problem or a good explanation of why it is not possible to do (with examples).
Please check out my DB Fiddle.
2
Answers
Please look next solution:
https://sqlize.online/sql/psql11/88ab227ab4a34c532fc711cff533f272/
It returns almost desired result. You only need to unique assets_id array
There are two distinct parts to the question:
Part 1 : a graph-walking problem
Identifying clusters of users based on common references reads like a graph-walking problem. That’s a complex task in SQL, and requires a recursive query. The pattern is to unpivot users’ references to generate nodes, then identify edges (nodes that have a ref in common), and finally walk through the graph (whitout looping) to generate groups.
In Postgres, arrays come handy to aggregate nodes:
Part 2 :
left join
and aggregationNow that we identified the groups, we can check for assets. Since you want all assets in the result, we start from the
assets
table, then bring the users and the groups withleft join
s. We can stillgroup by
the user groups, which puts all orphan assets in the same group (where the group isnull
) – that’s exactly what we want.The last step is array aggregation; the "propagation" of the owners of orphan assets to
ref2
can be handled with acase
expression.Demo on DB Fiddlde