create table A {
id text,
val text }
create table B {
val0 text,
val1 text,
val2 text }
id
and val
of table A
contain each a large, but discrete number of values in an many-to-many relationship. Sample data looks like follows:
id | val |
---|---|
one | a |
one | b |
one | y |
two | a |
two | d |
two | e |
two | x |
three | c |
four | c |
four | f |
four | z |
four | g |
The second table B
contains combinations from the same set as those for val
, for example:
val0 | val1 | val2 |
---|---|---|
a | b | c |
a | d | e |
c | f | g |
z | f | c |
I am trying to find a query that returns those, and only those, id
‘s of table A
for which there are val
that cover at least one entire row from table B
.
In the example above, the query should return id
"two", because that id-group contains all values from the second row in table B
, i.e. "a", "d", and "e". It is irrelevant that "x" is contained in that group as well. The query also should return "four", as the values in that group cover now rows 3 and 4 in table B
. The order in table A
or table B
is not important.
I managed to do the query as a procedure. What is a concise SQL query that works efficiently with tables with millions of rows?
Post scriptum:
Both answers provided by @Ajax1234 and @GMB (second answer that does not introduce a pk) work well. However, the left-join based solution by @Ajax1234 was significantly more performant on the real dataset: it took about 9 seconds for @Ajax1234 solution vs. more than a minute for the solution by @GMB
2
Answers
I would suggest a relation division approach here.
Normally you would have a primary key in the second table, say
pk
, so we could do:Note that this assumes no duplicates id/val tuples in the first table, otherwise we need
count(distinct a.val)
instead.An alternative is a lateral join (which also does not require a pk in the second table):
Using a series of
left join
s:See fiddle.