I want to be able to search for people "similar" to what I have in the database (e.g. a person with a typo in their name or date of birth). I want to use the levenshtein
function for that in the following way:
create temp table person (first_name text, last_name text, dob date);
select * from person where levenshtein('johndoe2000-01-01', lower(first_name || last_name || dob)) < 4;
(I am aware this can be probably optimized a bit by e.g. using YYYYMMDD
instead of YYYY-MM-DD
, but that is irrelevant to my question.)
I thought that I could speed up the search with an index. However, I don’t know how to create one that would work:
create index fuzzy on person ((lower(first_name || last_name || dob)));
results in ERROR: functions in index expression must be marked IMMUTABLE
. I tried a few things: using concat
instead of ||
, casting dob
to text
explicitly or with to_char
, but to no avail. What am I doing wrong? It seems that the problem results from mixing text
and date
– I can create an index with only text
columns or only the date
column without a problem.
2
Answers
The exercise is pointless: you can never use an index to speed up a query that uses
levenshtein()
.Casting dates to strings depends on database setting that might change, and so it is not immutable.
You could solve that problem in a variety of ways. But once you do, it still wouldn’t work as levenshtein distance is generally not indexable.