skip to Main Content
create table A {
   id   text,
   val  text }

create table B {
   val0 text, 
   val1 text,
   val2 text }

id and val of table A contain each a large, but discrete number of values in an many-to-many relationship. Sample data looks like follows:

id val
one a
one b
one y
two a
two d
two e
two x
three c
four c
four f
four z
four g

The second table B contains combinations from the same set as those for val, for example:

val0 val1 val2
a b c
a d e
c f g
z f c

I am trying to find a query that returns those, and only those, id‘s of table A for which there are val that cover at least one entire row from table B.

In the example above, the query should return id "two", because that id-group contains all values from the second row in table B, i.e. "a", "d", and "e". It is irrelevant that "x" is contained in that group as well. The query also should return "four", as the values in that group cover now rows 3 and 4 in table B. The order in table A or table B is not important.

I managed to do the query as a procedure. What is a concise SQL query that works efficiently with tables with millions of rows?

Post scriptum:
Both answers provided by @Ajax1234 and @GMB (second answer that does not introduce a pk) work well. However, the left-join based solution by @Ajax1234 was significantly more performant on the real dataset: it took about 9 seconds for @Ajax1234 solution vs. more than a minute for the solution by @GMB

2

Answers


  1. I would suggest a relation division approach here.

    Normally you would have a primary key in the second table, say pk, so we could do:

    select a.id, b.pk
    from tablea a
    inner join tableb b on a.val in (b.val0, b.val1, b.val2)
    group by a.id, b.pk 
    having count(*) = 3
    

    Note that this assumes no duplicates id/val tuples in the first table, otherwise we need count(distinct a.val) instead.

    An alternative is a lateral join (which also does not require a pk in the second table):

    select b.*, a.id
    from tableb b
    cross join lateral (
        select a.id
        from tablea a 
        where a.val in (b.val0, b.val1, b.val2)
        group by a.id
        having count(*) = 3
    ) a
    
    Login or Signup to reply.
  2. Using a series of left joins:

    select distinct a.id 
    from B b 
    left join A a on b.val0 = a.val 
    left join A a1 on b.val1 = a1.val and a.id = a1.id 
    left join A a2 on b.val2 = a2.val and a2.id = a.id 
    where a.id is not null and a1.id is not null and a2.id is not null
    

    See fiddle.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search