skip to Main Content

I want to be able to search for people "similar" to what I have in the database (e.g. a person with a typo in their name or date of birth). I want to use the levenshtein function for that in the following way:

create temp table person (first_name text, last_name text, dob date);
select * from person where levenshtein('johndoe2000-01-01', lower(first_name || last_name || dob)) < 4;

(I am aware this can be probably optimized a bit by e.g. using YYYYMMDD instead of YYYY-MM-DD, but that is irrelevant to my question.)

I thought that I could speed up the search with an index. However, I don’t know how to create one that would work:

create index fuzzy on person ((lower(first_name || last_name || dob)));

results in ERROR: functions in index expression must be marked IMMUTABLE. I tried a few things: using concat instead of ||, casting dob to text explicitly or with to_char, but to no avail. What am I doing wrong? It seems that the problem results from mixing text and date – I can create an index with only text columns or only the date column without a problem.

2

Answers


  1. The exercise is pointless: you can never use an index to speed up a query that uses levenshtein().

    Login or Signup to reply.
  2. Casting dates to strings depends on database setting that might change, and so it is not immutable.

    select '2021-04-03'::date::text;
     2021-04-03
    set datestyle TO 'german';
    select '2021-04-03'::date::text;
     03.04.2021
    

    You could solve that problem in a variety of ways. But once you do, it still wouldn’t work as levenshtein distance is generally not indexable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search