I have two tables, table 2 has one row per variable, while table 1 has many rows per variable. I want to randomly select 1000 rows from table 2 (1000 variables) and then join those to table 1. Final table will have many rows per variable. I’m also adding conditions for the selection from both tables. This is my code but it’s not picking 1000 variables from table 2:
SELECT t1.var1, t1.var2, t2.var3, t2.var4
FROM table1 t1
RIGHT JOIN (SELECT var1, var3 FROM table2 ORDER BY RAND() LIMIT 1000) t2
ON t1.var1 = t2.var2
WHERE t1.var2 < 90 AND t2.var4 = "asd";
What am I missing?
Also, I’ve tested this putting a limit of 10 variables, and sometimes I get zero rows…
2
Answers
Your query could easily filter out all results.
This selects the 1000 random rows before testing for the rows where
t2.var4 = 'asd'
. I assume not all rows match that condition. So it’s possible that none of the 1000 randomly selected rows match that condition.Also you are testing
t1.var2 < 90
after the outer join, which means only rows oft1
that have at least one corresponding row oft2
matching that range condition will be included. This also creates the possibility that the query returns no results, because not all rows oft1
have such corresponding rows oft2
.You should be able to fix both problems.
This puts the condition on
t2.var4
inside the subquery, so the subquery returns 1000 randomly selected rows from the subset of rows that match.Also putting the condition on
t1.var2
into the join condition means that rows oft2
will be returned even if they they have no corresponding rows oft1
that match that condition.RIGHT JOIN
is commonly avoided, because right-join queries are way harder to read than left-join queries. Your query equals:In an outer joined row all t1 values are
NULL
, soWHERE t1.var2 < 90
dismisses these rows. Your query is henceNow, what does this query do? It first picks 1000 random t2 rows. Then it dismisses all that don’t match
t2.var4 = 'asd'
. It also joins t1 rows matchingt1.var2 < 90
and dismisses all t2 rows that don’t have such a match. This leaves you with 0 to 1000 t2 rows / variables and their matches – how many different variables is left to chance.It seems you rather want:
i.e., 1000 vars from table 2 whether they have t1 matches or not.
Or
i.e., 1000 vars from table 2 that have t1 matches.