How do I remove other substrings in a string in mySQL?

h1y2eons
March 2, 2024
162 views
0 votes
2 Answers

here are 5 rows of my column Genres in my table data

Classics, Fiction, Historical Fiction, School, Literature, Young Adult, Historical

Fantasy, Fiction, Young Adult, Magic, Childrens, Middle Grade, Classics

Classics, Fiction, Romance, Historical Fiction, Literature, Historical, Audiobook

Classics, Nonfiction, History, Biography, Memoir, Historical, Holocaust

Classics, Fiction, Dystopia, Fantasy, Politics, School, Literature

What syntax should i use to make sure that all rows only have either values:

Classics, Fiction, Literature, Young Adult, Fantasy

i am currently lost and all suggested solutions from the internet have resulted to nothing

Tags: mysql

Answers

- BillKarwin
- March 2, 2024 at 7:49 pm
- 0 votes
0
Here’s a solution:
```
WITH cte AS (
  SELECT id, REPLACE(genres, ', ', ',') AS genres
  FROM data
)
UPDATE data JOIN cte USING (id)
SET data.genres = CONCAT_WS(',',
    IF(FIND_IN_SET('Classics', cte.genres), 'Classics', NULL),
    IF(FIND_IN_SET('Fiction', cte.genres), 'Fiction', NULL),
    IF(FIND_IN_SET('Literature', cte.genres), 'Literature', NULL),
    IF(FIND_IN_SET('Young Adult', cte.genres), 'Young Adult', NULL),
    IF(FIND_IN_SET('Fantasy', cte.genres), 'Fantasy', NULL));
```
Demo using MySQL 8.0: https://dbfiddle.uk/XpPxmhX1

The better solution is to store each genre in a separate row of a second table. This is a standard way to store many-to-many relationships.

Re your comment:

is there a solution that takes into account the ordering?

If the ordering is significant, then I suppose the potential duplicates are also significant.

There could be a solution, but the more important point is that there shouldn’t have to be.

Store the data in a normalized fashion, and it will be a lot easier. You won’t have to solve hard problems like manipulating comma-separated lists.

Here’s an example:
```
CREATE TABLE genres (
  genre VARCHAR(20) PRIMARY KEY)
);

INSERT INTO genres VALUES ('Classics'), ('Fiction'), ('Literature'), ('Young Adult'), ('Fantasy');

CREATE TABLE data_genres (
  data_id INT NOT NULL,
  ordinal INT NOT NULL,
  genre VARCHAR(20) NOT NULL,
  PRIMARY KEY (data_id, ordinal)
  FOREIGN KEY (genre) REFERENCES genres(genre) 
    ON DELETE CASCADE ON UPDATE CASCADE
);

INSERT INTO data_genres VALUES
(1, 1, 'Classics'),
(1, 2, 'Fiction'),
(1, 3, 'Literature'),
(1, 4, 'Young Adult');
```
This stores one genre value per row, and only the values currently enumerated by the genres table. You can store them in any order you want in the data_genre table. It also permits duplicates if that’s desired.

If you want to change the list of genres, then add or remove a row to the genres table.
Login or Signup to reply.

A normalized table could look like:

CREATE TABLE NewGenres
WITH RECURSIVE nrs as (
  SELECT 1 as a
  UNION ALL
  SELECT a+1 FROM nrs WHERE a<20)
,Genre as (
  SELECT R, a, trim(substring_index(substring_index(Genres,',',a),',',-1)) as Genre
  FROM nrs
  CROSS JOIN (SELECT row_number() over (order by Genres) R, Genres FROM mytable)  g
  WHERE a <= LENGTH(Genres)-LENGTH(REPLACE(Genres,',',''))+1
)
SELECT * 
FROM Genre
ORDER BY R,a;

SELECT * FROM NewGenres;

and used like this:

SELECT 
    R, 
    sum(CASE WHEN Genre in ('Classics', 'Fiction', 'Literature', 'Young Adult', 'Fantasy') then 1 else 0 end) as NrofGenresInList,
    count(*) NumberOfGenres,
    GROUP_CONCAT(Genre ORDER BY Genre)  as ListOfGenres
FROM NewGenres
GROUP BY R
;

see: DBFIDDLE

Please signup or login to give your own answer.

Click here to cancel reply.