Postgresql unique constraint mystery that does not seem to work with batch inserts

C3820C
October 18, 2023
228 views
1 vote
2 Answers

I have a table with a unique constraint over 5 columns, all nullable=false. These columns are varchar(30), varchar, 3 float8 columns. There is an autoincrement id column, but that is not included in the constraint.

Often times I have duplicates and so my query has "…on conflict do nothing". I have seen a batch insert of rows I that I to know have all duplicates, and so I expect to have 0 affected rows. Instead I have some dupes inserted (From what i have seen it’s about 5-15%, it’s not constant).

After this, I queried the table with a select distinct across only all columns in the constraint, and postgres will returned the duplicate rows with duplicate values. Same goes for a select query with GROUP BY on all columns in the constraint. I expect the dupe rows to collapse into one, but instead I see both duplicate rows returned to me.

However, when I manually delete one of the two duplicate rows, and then try to insert the single duplicate row then postgresql will rightly tell me there is a conflict and i wont be able to insert.

Am I missing something? My code is now adjusted to insert rows one by one, but of course that is a lot less efficient.

Tags: postgresql unique-constraint

Answers

- LaurenzAlbe
- October 18, 2023 at 9:40 pm
- 0 votes
0
Your experiments with GROUP BY and DISTINCT prove that the rows are not really identical. I can think of two likely causes:
1. The double precision columns look identical (in all significant digits), but aren’t really. It is always very problematic to perform equality comparisons on floating point numbers. Try setting extra_float_digits = 3 and see if you can spot the difference.
2. Some of the strings look equal, but aren’t. Popular examples are (space) and (no-break space), M (ASCII letter) and Μ (Greek upper case letter Mu) or ä (a diaeresis) and ä (a and combining diaeresis—two code points that together form a character).
My money is on 1.
Login or Signup to reply.

Use quote_literal(), ascii(), md5(), sha256(v::bytea) when you inspect text values: demo

select v,quote_literal(v),ascii(v),md5(v),sha256(v::bytea) 
from (values 
  (''   ),
  (' '  ),
  (E't')) AS _(v);

quote_literal	ascii	md5	sha256
”	0	d41d8cd98f00b204e9800998ecf8427e	xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
‘ ‘	32	7215ee9c7d9dc229d2921a40e899ec5f	x36a9e7f1c95b82ffb99743e0c5c4ce95d83c9a430aac59f84ef3cbfab6145068
‘ ‘	9	5e732a1878be2342dbfeff5fe3ca5aa3	x2b4c342f5433ebe591a1da77e013d1b72475562d48578dca8b84bac6651c3cb9

Make sure your floats are equal, not just look equal.
If those really are duplicates, clean them up and reindex table in case you have a corrupted index.

Please signup or login to give your own answer.

Click here to cancel reply.