skip to Main Content

here are 5 rows of my column Genres in my table data

Classics, Fiction, Historical Fiction, School, Literature, Young Adult, Historical

Fantasy, Fiction, Young Adult, Magic, Childrens, Middle Grade, Classics

Classics, Fiction, Romance, Historical Fiction, Literature, Historical, Audiobook

Classics, Nonfiction, History, Biography, Memoir, Historical, Holocaust

Classics, Fiction, Dystopia, Fantasy, Politics, School, Literature

What syntax should i use to make sure that all rows only have either values:

Classics, Fiction, Literature, Young Adult, Fantasy

i am currently lost and all suggested solutions from the internet have resulted to nothing

2

Answers


  1. Here’s a solution:

    WITH cte AS (
      SELECT id, REPLACE(genres, ', ', ',') AS genres
      FROM data
    )
    UPDATE data JOIN cte USING (id)
    SET data.genres = CONCAT_WS(',',
        IF(FIND_IN_SET('Classics', cte.genres), 'Classics', NULL),
        IF(FIND_IN_SET('Fiction', cte.genres), 'Fiction', NULL),
        IF(FIND_IN_SET('Literature', cte.genres), 'Literature', NULL),
        IF(FIND_IN_SET('Young Adult', cte.genres), 'Young Adult', NULL),
        IF(FIND_IN_SET('Fantasy', cte.genres), 'Fantasy', NULL));
    

    Demo using MySQL 8.0: https://dbfiddle.uk/XpPxmhX1

    The better solution is to store each genre in a separate row of a second table. This is a standard way to store many-to-many relationships.


    Re your comment:

    is there a solution that takes into account the ordering?

    If the ordering is significant, then I suppose the potential duplicates are also significant.

    There could be a solution, but the more important point is that there shouldn’t have to be.

    Store the data in a normalized fashion, and it will be a lot easier. You won’t have to solve hard problems like manipulating comma-separated lists.

    Here’s an example:

    CREATE TABLE genres (
      genre VARCHAR(20) PRIMARY KEY)
    );
    
    INSERT INTO genres VALUES ('Classics'), ('Fiction'), ('Literature'), ('Young Adult'), ('Fantasy');
    
    CREATE TABLE data_genres (
      data_id INT NOT NULL,
      ordinal INT NOT NULL,
      genre VARCHAR(20) NOT NULL,
      PRIMARY KEY (data_id, ordinal)
      FOREIGN KEY (genre) REFERENCES genres(genre) 
        ON DELETE CASCADE ON UPDATE CASCADE
    );
    
    INSERT INTO data_genres VALUES
    (1, 1, 'Classics'),
    (1, 2, 'Fiction'),
    (1, 3, 'Literature'),
    (1, 4, 'Young Adult');
    

    This stores one genre value per row, and only the values currently enumerated by the genres table. You can store them in any order you want in the data_genre table. It also permits duplicates if that’s desired.

    If you want to change the list of genres, then add or remove a row to the genres table.

    Login or Signup to reply.
  2. A normalized table could look like:

    CREATE TABLE NewGenres
    WITH RECURSIVE nrs as (
      SELECT 1 as a
      UNION ALL
      SELECT a+1 FROM nrs WHERE a<20)
    ,Genre as (
      SELECT R, a, trim(substring_index(substring_index(Genres,',',a),',',-1)) as Genre
      FROM nrs
      CROSS JOIN (SELECT row_number() over (order by Genres) R, Genres FROM mytable)  g
      WHERE a <= LENGTH(Genres)-LENGTH(REPLACE(Genres,',',''))+1
    )
    SELECT * 
    FROM Genre
    ORDER BY R,a;
    
    SELECT * FROM NewGenres;
    

    and used like this:

    SELECT 
        R, 
        sum(CASE WHEN Genre in ('Classics', 'Fiction', 'Literature', 'Young Adult', 'Fantasy') then 1 else 0 end) as NrofGenresInList,
        count(*) NumberOfGenres,
        GROUP_CONCAT(Genre ORDER BY Genre)  as ListOfGenres
    FROM NewGenres
    GROUP BY R
    ;
    

    see: DBFIDDLE

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search