skip to Main Content

I have a MYSQL table called aa_ngl_defs of the following form:

id Word POS Definition
1 the article used to point to something already mentioned
2 be verb used to show the identity of a thing
3 and conj used to join words or groups of words
4 of prep belonging to or connected with something
5 to prep used to indicate place or direction
6 a article one particular thing or one of a class of things
7 in prep used to indicate being inside of or surrounded by something else
8 have verb to own, possess, or hold something
9 it pronoun a thing that has been previously mentioned
10 you pronoun used to refer to the person the speaker is addressing
11 he pronoun used to refer to a male person that is the subject
12 for prep indicating the purpose or need of something
13 they pronoun used to refer to two or more people, animals, or things
14 not adverb used to make an expression negative
15 that adj used to identify a specific person, thing, fact or idea
16 we pronoun used to refer to the speaker and another person as the subject
17 on prep touching and being supported or physically in contact by something

The table has unique contiguous integers in the id column.

I want to create a new table by adding 3 columns of definitions taken randomly from other rows within this table. This would allow me to create quizzes with 3 distractors. I need reasonable performance speed but not necessarily high speed, as the table has 3,500 rows but I only need to generate a new quiz once-a-week.

So that it looks like this:

id  Word    POS Correct_Definition   Distractor1    Distractor2   Distractor3

The online quiz generators Quizlet, Kahoot and Blooket are able to take a simple list like this and produce quizzes with 3 distractors. That’s what I’m trying to emulate.

The following works OK for adding 1 extra distractor definition:

    SELECT t1.id, t1.Word, t1.Definition, t2.Definition
    FROM aa_ngl_defs AS t1
    LEFT JOIN aa_ngl_defs AS t2 
      ON t1.id <> t2.id -- Ensure we don't join the row with itself
    ORDER BY RAND()
    LIMIT 3;

But I don’t know how to add two more distractor definitions, and also my query took 17s which I suspect will multiply exponentially if I try to add more columns.

I tried to incorporate the following to improve performance

    SELECT *
      FROM aa_ngl_defs AS r1 JOIN
        (SELECT (RAND() * (SELECT MAX(id) FROM aa_ngl_defs)) AS id
        )
            AS r2
     WHERE r1.id >= r2.id
    ORDER BY r1.id ASC
    LIMIT 1;

… and this works very fast to select one row at random, but I am completely lost as to how incorporate this to add the three distractors. I think I need nested select statements or perhaps variables.

Obviously, since each row will have 3 distractors there will be some repeats of distractors across rows, but I can’t have repeated distractors within any single row. Randomness only needs to be reasonably random, not strictly so.

2

Answers


  1. See if this works for you. You can move the random rows logic into a lateral join and then apply a row_number, followed by conditional aggregation to pivot into your required columns:

    select id, word, pos, Definition Correct_Definition,
      max(case when cn = 1 then rdef end) Distractor1,
      max(case when cn = 2 then rdef end) Distractor2,
      max(case when cn = 3 then rdef end) Distractor3
    from (
      select *, row_number() over(partition by t.id) cn
      from t
      join lateral (
          select definition rdef
          from t t2
          where t2.id != t.id
          order by rand()
          limit 3
      )r
    )t
    group by id, word, pos, definition;
    

    See this demo Fiddle

    Login or Signup to reply.
  2. This dbfiddle sometimes picks a duplicate distractor

    WITH cte AS (
      select 
        id,
        floor(rand()*17)+1 as R1,
        floor(rand()*17)+1 as R2,
        floor(rand()*17)+1 as R3
      from mytable
    ),
    cte2 as (select * from cte) 
    SELECT
      m.id,
      m.Word,
      m.POS,
      m.Definition,
      d1.id as id1,
      d1.Definition as Distractor1,
      d2.id as id2,
      d2.Definition as Distractor2,
      d3.id as id3,
      d3.Definition as Distractor3
    FROM mytable m
    INNER JOIN cte2 as c ON c.id = m.id
    INNER JOIN mytable d1 on d1.id = c.R1
    INNER JOIN mytable d2 on d2.id = c.R2
    INNER JOIN mytable d3 on d3.id = c.R3
    ORDER BY m.id
    

    The reason for picking duplicates is known, the reason of not returning 17 rows is not known (to me ).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search