skip to Main Content

I have a table with a unique constraint over 5 columns, all nullable=false. These columns are varchar(30), varchar, 3 float8 columns. There is an autoincrement id column, but that is not included in the constraint.

Often times I have duplicates and so my query has "…on conflict do nothing". I have seen a batch insert of rows I that I to know have all duplicates, and so I expect to have 0 affected rows. Instead I have some dupes inserted (From what i have seen it’s about 5-15%, it’s not constant).

After this, I queried the table with a select distinct across only all columns in the constraint, and postgres will returned the duplicate rows with duplicate values. Same goes for a select query with GROUP BY on all columns in the constraint. I expect the dupe rows to collapse into one, but instead I see both duplicate rows returned to me.

However, when I manually delete one of the two duplicate rows, and then try to insert the single duplicate row then postgresql will rightly tell me there is a conflict and i wont be able to insert.

Am I missing something? My code is now adjusted to insert rows one by one, but of course that is a lot less efficient.

2

Answers


  1. Your experiments with GROUP BY and DISTINCT prove that the rows are not really identical. I can think of two likely causes:

    1. The double precision columns look identical (in all significant digits), but aren’t really. It is always very problematic to perform equality comparisons on floating point numbers. Try setting extra_float_digits = 3 and see if you can spot the difference.

    2. Some of the strings look equal, but aren’t. Popular examples are (space) and   (no-break space), M (ASCII letter) and Μ (Greek upper case letter Mu) or ä (a diaeresis) and (a and combining diaeresis—two code points that together form a character).

    My money is on 1.

    Login or Signup to reply.
    1. Use quote_literal(), ascii(), md5(), sha256(v::bytea) when you inspect text values: demo
      select v,quote_literal(v),ascii(v),md5(v),sha256(v::bytea) 
      from (values 
        (''   ),
        (' '  ),
        (E't')) AS _(v);
      
      v quote_literal ascii md5 sha256
      0 d41d8cd98f00b204e9800998ecf8427e xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
      ‘ ‘ 32 7215ee9c7d9dc229d2921a40e899ec5f x36a9e7f1c95b82ffb99743e0c5c4ce95d83c9a430aac59f84ef3cbfab6145068
      ‘ ‘ 9 5e732a1878be2342dbfeff5fe3ca5aa3 x2b4c342f5433ebe591a1da77e013d1b72475562d48578dca8b84bac6651c3cb9
    2. Make sure your floats are equal, not just look equal.
    3. If those really are duplicates, clean them up and reindex table in case you have a corrupted index.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search