Can Postgresql merge users w/ multiple refs & count their collective assets?

Bj248rnBr229then
May 29, 2023
338 views
0 votes
2 Answers

There is a set of users. A person can have multiple users, but ref1 and ref2 might be alike and can therefore link users together. ref1 and ref2 does not overlap, one value in ref1 does not exist in ref2.

A user can own multiple assets. I want to "merge" users that has one or more refs alike and then count how many assets they own together. There could be missing entries in the user table, in that case I just want to propagate the owner into ref2 and set the asset_count and asset_ids.

Here is an example schema to illustrate:

Example assets

SELECT * FROM assets;

id	name	owner
1	#1	a
2	#2	b
3	#3	c
4	#4	a
5	#5	c
6	#6	d
7	#7	e
8	#8	d
9	#9	a
10	#10	a
11	#11	z

Example users

SELECT * FROM users;

id	username	ref1	ref2
1	bobo	a	d
2	toto	b	e
3	momo	c	d
4	lolo	a	f
5	popo	c	f

What I want to get in the end

SELECT * FROM results;

ids	usernames	refs1	refs2	asset_ids	asset_count
1,3,4,5	bobo,momo,lolo,popo	a,c	d,f	1,3,4,5,6,8,9,10	8
2	toto	b	e	2,7	2
			z	11	1

I’ve tried different approaches, but this is what I currently have:

Closest I have got

SELECT
  ARRAY_AGG(DISTINCT u.id) AS ids,
  ARRAY_AGG(DISTINCT u.username) AS usernames,
  ARRAY_AGG(DISTINCT u.ref1) AS refs1,
  ARRAY_AGG(DISTINCT u.ref2) AS refs2,
  COUNT(DISTINCT a.id) AS asset_count
FROM assets a
JOIN users u ON a.owner = u.ref1 OR a.owner = u.ref2
GROUP BY a.owner
ORDER BY MIN(a.id);

ids	usernames	refs1	refs2	asset_count
1,4	bobo,lolo	a	d,f	4
2	toto	b	e	1
3,5	momo,popo	c	d,f	2
1,3	bobo,momo	a,c	d	2
2	toto	b	e	1

If I merge the above table on ids, I almost get the result I want (without the missing entries in the user table). The merging can easily be done in code, but then I cannot paginate etc. I want to to this in DB layer if possible.

I want either a solution to the problem or a good explanation of why it is not possible to do (with examples).

Please check out my DB Fiddle.

Answers

Please look next solution:

WITH agg1 as (
    SELECT 
      string_agg(distinct users.id::text, ',') id, 
      string_agg(distinct username, ',') username, 
      string_agg(distinct ref2, ',') ref2,
      string_agg(distinct assets.id::text, ',') assets_id,
      sum(assets_count) assets_count,
      ref1
    FROM users
    JOIN (
      SELECT string_agg(id::text, ',') id, count(*) assets_count, owner 
      FROM assets GROUP BY owner
    ) assets ON assets.owner in (ref1, ref2)
    GROUP BY ref1
) SELECT 
    string_to_array(string_agg(id::text, ','),',') ids, 
    string_to_array(string_agg(username, ','),',') usernames, 
    string_to_array(string_agg(ref1, ','),',') refs1,
    string_to_array(ref2,',') refs2,
    string_to_array(string_agg(assets_id, ','),',') assets_id,
    min(assets_count) assets_count
FROM agg1
GROUP BY agg1.ref2
;

https://sqlize.online/sql/psql11/88ab227ab4a34c532fc711cff533f272/

It returns almost desired result. You only need to unique assets_id array

- GMB
- May 28, 2023 at 11:37 pm
- 0 votes
0
There are two distinct parts to the question:
- the first one is how to generate groups of users that have common references
- the second part is how to distribute the assets in the groups, while taking in account orphan assets
Part 1 : a graph-walking problem

Identifying clusters of users based on common references reads like a graph-walking problem. That’s a complex task in SQL, and requires a recursive query. The pattern is to unpivot users’ references to generate nodes, then identify edges (nodes that have a ref in common), and finally walk through the graph (whitout looping) to generate groups.

In Postgres, arrays come handy to aggregate nodes:
```
with recursive 
    nodes as (
        select u.id, r.ref
        from users u 
        cross join lateral ( values (u.ref1), (u.ref2) ) r(ref)
    ),
    edges as (
        select distinct n1.id as id1, n2.id as id2
        from nodes n1 
        inner join nodes n2 on n1.ref = n2.ref
    ),
    rcte as (
        select id1, id2, array[id1] as visited from edges where id1 = id2
        union all
        select r.id1, e.id2, r.visited || e.id2
        from rcte r
        inner join edges e on e.id1 = r.id2
        where e.id2 <> all(r.visited) 
    ),
    groups as (
        select id1 as id, array_agg(distinct id2 order by id2) as ids
        from rcte
        group by id1
    )
select * from groups order by id
```
id ids

1 {1,3,4,5}

2 {2}

3 {1,3,4,5}

4 {1,3,4,5}

5 {1,3,4,5}

Part 2 : left join and aggregation

Now that we identified the groups, we can check for assets. Since you want all assets in the result, we start from the assets table, then bring the users and the groups with left joins. We can still group by the user groups, which puts all orphan assets in the same group (where the group is null) – that’s exactly what we want.

The last step is array aggregation; the "propagation" of the owners of orphan assets to ref2 can be handled with a case expression.
```
with recursive 
    nodes  as (...),
    edges  as (...),
    rcte   as (...),
    groups as (...)
select g.ids,
    array_agg(distinct u.username) as usernames,
    array_agg(distinct u.ref1) as refs1,
    case when g.ids is null then array_agg(distinct a.owner) else array_agg(distinct u.ref2) end as refs2,
    array_agg(distinct a.id) as asset_ids,
    count(distinct a.id) as asset_count
from assets a
left join users u on a.owner in (u.ref1, u.ref2)
left join groups g on g.id = u.id
group by g.ids
```
ids usernames refs1 refs2 asset_ids asset_count

{1,3,4,5} {bobo,lolo,momo,popo} {a,c} {d,f} {1,3,4,5,6,8,9,10} 8

{2} {toto} {b} {e} {2,7} 2

null {NULL} {NULL} {z} {11} 1

Demo on DB Fiddlde
Login or Signup to reply.

id	ids
1	{1,3,4,5}
2	{2}
3	{1,3,4,5}
4	{1,3,4,5}
5	{1,3,4,5}

ids	usernames	refs1	refs2	asset_ids	asset_count
{1,3,4,5}	{bobo,lolo,momo,popo}	{a,c}	{d,f}	{1,3,4,5,6,8,9,10}	8
{2}	{toto}	{b}	{e}	{2,7}	2
null	{NULL}	{NULL}	{z}	{11}	1

Please signup or login to give your own answer.

Click here to cancel reply.

Can Postgresql merge users w/ multiple refs & count their collective assets?

Answers

Part 1 : a graph-walking problem

Part 2 : left join and aggregation

Part 2 : `left join` and aggregation