skip to Main Content
  1. I am trying to find all the rows in my table, which a certain column is duplicated in them. and identify them by the unique id column that each row has.

  2. I want to delete them and leave only one copy(row), it doesn’t matter which one. (if it can be done through SQL, it is preferred, if note, i will code it..)

(I’m using workbench)

I tried this:

SELECT 
  *
FROM
  table_name
GROUP BY `field_name`
HAVING COUNT(*) > 1;

It failed.
My first problem was the sql_mode=only_full_group_by, so I fixed it by adding field names to the query. and it worked.. well, sort of. Since now I saw only the aggregated result without the id column.

SELECT 
    field_name
FROM
    table_name
GROUP BY `field_name`
HAVING COUNT(*) > 1;

So, here I am..
How can I do it?

2

Answers


  1. If I follow you correctly, you can use window functions:

    select *
    from (
        select t.*, count(*) over(partition by field_name) cnt
        from mytable t
    ) t
    where cnt > 1
    

    For each row, the subquery does a window count that computes how many rows share the same field_name value. Then, the outer query filters on rows whose field_name is not unique.

    Note that this requires MySQL 8.0. In earlier version, an alternative uses exists:

    select *
    from mytable t
    where exists (
        select 1 
        from mytable t1 
        where t1.field_value = t.field_value and t1.id != t.id
    )
    
    Login or Signup to reply.
  2. If I understand correctly, I’d do this with a sub query. The inner query finds all the column names that are duplicated. Take the name from that query, then pass it into the outer query to find the object id for those names.

    You’re using the system tables for this query which sometimes have elevated permissions.

    I just read that you’re using MySQL. This query works in MS SQL, so it may not work exactly as written, but I think the concept is solid so I’ll post it.

    --outer query
    SELECT 
        object_id
      , name
    FROM    
      sys.columns c
    WHERE name in (
    --Inner Query
        SELECT  Name
        FROM
            sys.columns o
         GROUP BY Name
        HAVING COUNT(*) > 1
    ) 
    ORDER BY 2
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search