I have a table with a unique constraint over 5 columns, all nullable=false. These columns are varchar(30), varchar, 3 float8 columns. There is an autoincrement id column, but that is not included in the constraint.
Often times I have duplicates and so my query has "…on conflict do nothing". I have seen a batch insert of rows I that I to know have all duplicates, and so I expect to have 0 affected rows. Instead I have some dupes inserted (From what i have seen it’s about 5-15%, it’s not constant).
After this, I queried the table with a select distinct across only all columns in the constraint, and postgres will returned the duplicate rows with duplicate values. Same goes for a select query with GROUP BY on all columns in the constraint. I expect the dupe rows to collapse into one, but instead I see both duplicate rows returned to me.
However, when I manually delete one of the two duplicate rows, and then try to insert the single duplicate row then postgresql will rightly tell me there is a conflict and i wont be able to insert.
Am I missing something? My code is now adjusted to insert rows one by one, but of course that is a lot less efficient.
2
Answers
Your experiments with
GROUP BY
andDISTINCT
prove that the rows are not really identical. I can think of two likely causes:The
double precision
columns look identical (in all significant digits), but aren’t really. It is always very problematic to perform equality comparisons on floating point numbers. Try settingextra_float_digits = 3
and see if you can spot the difference.Some of the strings look equal, but aren’t. Popular examples are
(space) and
(no-break space),
M
(ASCII letter) andΜ
(Greek upper case letter Mu) orä
(a diaeresis) andä
(a and combining diaeresis—two code points that together form a character).My money is on 1.
quote_literal()
,ascii()
,md5()
,sha256(v::bytea)
when you inspect text values: demoreindex table
in case you have a corrupted index.