I have a table that has around 4,000,000 users who have connected their Epic Games and/or Discord accounts. This table has visible indexes for discord_id
and epic_id
.
I also have a Discord server that has 160,000 members. Not all of these members have connected an account, and therefore won’t appear in the table.
I would like to query the table for members in my server who have connected an Epic Games account. I imagine the basic way to do this would be…
SELECT discord_id, epic_id FROM users WHERE discord_id IN (?);
…where ?
is an array of 160,000 potential discord_id
. Is this the best approach?
I have seen the related question which discusses the reasons for performance loss of massive IN
operators. However, the question does not provide alternatives — aside from the author chunking the query, or others suggesting changing variables.
Are there even alternatives or is this my only choice? Is it something that you would use in production?
Could I, for example, keep an up-to-date table of discord_id
who are in my server, and then query the users
table for discord_id
that exists in the other table?
2
Answers
Without more information about your schema and its structure, it would definitely give better performance to have a "cache" table listing which users are in your server (using the user_id, or the discord_id), maybe updating it via triggers.
If
epic_id
isNULL
if the user hasn’t linked their account, then you could filter those out as well.How did you generate the list of
discord_ids
?Assuming the information is in another table,
JOIN
to that table. This will be faster than the two steps (1: find list; 2: run big IN).