skip to Main Content

I have a table that has around 4,000,000 users who have connected their Epic Games and/or Discord accounts. This table has visible indexes for discord_id and epic_id.

I also have a Discord server that has 160,000 members. Not all of these members have connected an account, and therefore won’t appear in the table.

I would like to query the table for members in my server who have connected an Epic Games account. I imagine the basic way to do this would be…

SELECT discord_id, epic_id FROM users WHERE discord_id IN (?);

…where ? is an array of 160,000 potential discord_id. Is this the best approach?

I have seen the related question which discusses the reasons for performance loss of massive IN operators. However, the question does not provide alternatives — aside from the author chunking the query, or others suggesting changing variables.

Are there even alternatives or is this my only choice? Is it something that you would use in production?

Could I, for example, keep an up-to-date table of discord_id who are in my server, and then query the users table for discord_id that exists in the other table?

2

Answers


  1. Without more information about your schema and its structure, it would definitely give better performance to have a "cache" table listing which users are in your server (using the user_id, or the discord_id), maybe updating it via triggers.

    If epic_id is NULL if the user hasn’t linked their account, then you could filter those out as well.

    Login or Signup to reply.
  2. How did you generate the list of discord_ids?

    Assuming the information is in another table, JOIN to that table. This will be faster than the two steps (1: find list; 2: run big IN).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search