skip to Main Content

We have a list of Song names (a simple List), a user sends a text message asking for 5 songs, something like

Hi team how are you, these are my 4 favourite songs

  1. Queen – Bohemian Rhaspody
  2. The Beatles – Hey Jube
  3. Michael Jackson – Billy Jean
  4. A-ha – Take on Ne

Oh actually I want to add this on the list as well
5. Aerosmith – Dream on

What would be the best way to compare the string sent by the user with the list of correct songs, and to autocorrect the user’s input and as a result get a new string with correct titles:

1. Queen - Bohemian Rhapsody
2. The Beatles - Hey Jude
3. Michael Jackson - Billie Jean
4. A-ha - Take On Me
5. Aerosmith - Dream On

I was looking at https://azure.microsoft.com/en-gb/products/ai-services/ai-language but do not see option to correct values based on pre pre-defined list. We can’t use a LLM for this as our correct list has 25,000 names and would be too crazy to run each input trough it

Does anyone have a suggest, ideally I’m looking for a c# library or an external API

thanks!

2

Answers


  1. If I understand this correctly, your correct songs list has 25,000 songs in it. That is not very many in the grand scheme of things, and it’s hard to justify using an LLM or even an API when you already have the master list.

    However, if you are dead set on an API, just use one of the many song names or song lyric services available.

    But without knowing more about your use case, I would suggest a simple approach. For example, loading the correct values into a data structure that is quick to string search (a hash table, for example).

    Finding the correct song is best achieved by using multiple methods, and no matter what methods you choose, you need to define some rules.

    First, before doing anything computationally heavy, just check if the name is correct (has a match).
    If it is not correct (no perfect match) then you need to decide what correct song most closely matches it.

    I would personally explore the idea of breaking the song submissions down into a list of artists and a list of songs attributed to that artist. That can help you narrow down options. The song title might be incorrect, but the artist may be correct. Don’t waste time searching all the songs if the artist is correct.

    After that, you need to make some rules. What correct string is the incorrect string closest to? You can use something like Levenshtein distance to put a physical number to how close one string is to another. But at the end of the day you will have to decide how close it needs to be, and what happens in the case of a tie.

    Alternatively, when a string has no match, you might consider re-prompting the sender with some correct options that are close to it.
    Example:
    You submitted: "The Beatles – Hey Jube" We don’t have this song in our list.
    Did you mean (y/n): "The Beatles – Hey Jude"?

    Login or Signup to reply.
  2. What I would do is create trigrams from the user input (each group of three letters, such as "Hey Jube" -> "Hey", "ey ", "y J", " Ju", "Jub", "ube") and then have a table or map from trigrams to lists of titles.

    (You could additionally canonicalize the strings before splitting them into trigrams, such as by smashing the case, normalizing whitespace, and removing diacritics.)

    Then, for each title returned from the trigrams of the input, calculate the edit distance between the title and the user input (by Levenshtein, Hamming distance, etc.) and return a list of the closest matches, or just take the best match if the edit distance is low enough.

    It would be worth it to check if there is an exact match first since that could be done in linear time and if there is one, the rest of the process can be skipped.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search