skip to Main Content

I have a table which links the IDs of war memorials with the IDs of records of soldiers. It has a unique index based on the combination of the two IDs (sample data below).

I have recently started storing the source of these connections, typically a URL. What I want to do is remove all lines that have no source except those where I still have no source stored.

By way of example

id | memorialId | soldierId | source
1  | 1          | 1         | Source 1
2  | 1          | 1         | Source 2
3  | 1          | 1         |
4  | 2          | 1         | Source 3
5  | 2          | 1         |
6  | 3          | 2         | Source 4
7  | 4          | 3         |
8  | 5          | 4         | Source 1
9  | 5          | 4         | Source 2

This should end up as

memorialId | soldierId | source
id | memorialId | soldierId | source
1  | 1          | 1         | Source 1
2  | 1          | 1         | Source 2
4  | 2          | 1         | Source 3
6  | 3          | 2         | Source 4
7  | 4          | 3         |
8  | 5          | 4         | Source 1
9  | 5          | 4         | Source 2

Line 3 is deleted because there is no source, and lines 1 & 2 have the same connection but with sources; similarly Line 5 because Line 4 has a source for that connection.

Line 6 remains as it is the only line defining that connection.

Line 7 remains for the same reason, even though it has no source.

Lines 8 & 9 are both kept as it’s two distinct sources for the same connection – I only want to delete lines where the source is empty.

The closest I have got so far is a query that identifies which lines need to be deleted

SELECT id,memorialId,soldierId,source FROM myTable 
GROUP BY memorialId,soldierId 
HAVING COUNT(*) > 1 AND min(source) = '';

This identifies the two lines for deletion, but if I then wrap that in a delete statement like

Delete from 
myTable where exists (select 1 FROM myTable 
GROUP BY memorialId,soldierId 
HAVING COUNT(*) > 1 AND min(source) = '');

I get a message saying that I can’t update the same table that’s in the FROM clause.

I’m probably missing something obvious as I only have basic knwowledge of mysql I’m afraid.

2

Answers


  1. https://dev.mysql.com/doc/refman/en/subquery-restrictions.html says:

    In general, you cannot modify a table and select from the same table in a subquery.

    But you can do joins in UPDATE and DELETE, and reference the table more than once.

    I’d do it this way:

    DELETE t1 FROM myTable AS t1
    JOIN myTable AS t2 USING (memorialId, soldierId)
    WHERE t1.source = '' AND t2.source <> '';
    

    The join is satisfied only if t1 has a blank source, and at least one row t2 does have the same id’s and does have a non-blank source.

    Demo: https://www.db-fiddle.com/f/gyyTPJiRmhzWh4gSFgbqA4/0

    Login or Signup to reply.
  2. You will need to use it as an implicit temp table try:

    DELETE FROM myTable
    WHERE id IN (SELECT t.id from (select id FROM myTable 
    GROUP BY memorialId,soldierId 
    HAVING COUNT(*) > 1 AND min(source) = '') t
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search