skip to Main Content

Working on a MySQL db which contains the Book Title, Author, Publication in a single column(field). The original db designer is said to put all those in single field to allow for easy search.
Age-old client database, so I am NOT allowed to modify it.

Example:

Id BookDetails
1  A Tale of Two Cities|Charles Dickens|Penguin
2  And Then There Were None,Agatha Christie Random House
3  Adventures of Sherlock Holmes Arthur Conan Doyle-Harper Collins
4  Integrated Physics, Sarah Ann Barnes & Nobles
5  Integrated Electronics Millman & Halkias McGraw
6  Integrated Electronics|Millman|McGraw

As one can see from above, the book name, author name, pub names are all jumbled up – some fields they are pipe delimited, somewhere hyphen-delimited, & somewhere no or space delimited.

Current search allows for only 1 word in search query (like "Integrated" which will return row 4, 5 & 6). Among other requirements, I am tasked with extending the search to allow multi-words (Integrated Electronics Halkias – 3 words).

The implementation challenge I am facing is which SQL query will be best?

  1. If I go with BookDetails like '%Integrated%Electronics%Halkias%', it will match 5 perfectly, but will miss 6. But if someone changes to search query order to say ‘Halkias Integrated Electronics’ – it will fail to match anything
  2. If I go with breaking it into single word AND (like BookDetails like '%Integrated%' and BookDetails like '%Electronic%' and BookDetails like '%Halkias%') will only match 5 (while ideally it should match 6 as well).
  3. If I go with breaking it into single word AND (like BookDetails like '%Integrated%' OR BookDetails like '%Electronic%' OR BookDetails like '%Halkias%') will match 4, 5 & 6 (while ideally it should only match 5 & 6 only).
  4. Any other suggestions

I’ve checked existing SO threads (like How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete? – doesnt fit books need) and (Searching for names in a MySQL database that probably has typos – talks about pronunciations/typos which still does not fit books).

Any inputs on possible implementations please

2

Answers


  1. Adding a score might help, to give the better results a higher place in the results:

    SELECT 
      id,
      BookDetails,
      CASE WHEN BookDetails like '%Integrated%'  THEN 1 ELSE 0 END +
      CASE WHEN BookDetails like '%Electronic%'  THEN 1 ELSE 0 END +
      CASE WHEN BookDetails like '%Halkias%'  THEN 1 ELSE 0 END as Score
    FROM books
    WHERE BookDetails like '%Integrated%' 
       or BookDetails like '%Electronic%' 
       or BookDetails like '%Halkias%'
    ORDER BY 3 DESC;
    

    output:

    id BookDetails Score
    5 Integrated Electronics Millman & Halkias McGraw 3
    6 Integrated Electronics|Millman|McGraw 2
    4 Integrated Physics, Sarah Ann Barnes & Nobles 1

    Or, adding a FULLTEXT INDEX (but that needs a change in your database).

    ALTER TABLE books ADD FULLTEXT(BookDetails);
    
    SELECT 
      id,
      BookDetails,
      MATCH (BookDetails)
         AGAINST ('Integrated Electronic Halkias' IN NATURAL LANGUAGE MODE) as Score
    FROM books
    WHERE  MATCH (BookDetails)
         AGAINST ('Integrated Electronic Halkias' IN NATURAL LANGUAGE MODE)
    ORDER BY 3 DESC;
    

    see: DBFIDDLE

    Login or Signup to reply.
  2. Two Things here:

    1. Since you can’t change the DB entries or design, you are left with the only option to play around with your SQL Query PLUS the way you take the inputs from your user.

    2. I am guessing that order of words in the DB may also be random – like there can be entry like "Integrated Electronics Millman & Halkias McGraw", and same book can also be as "Millman Halkias McGraw Integrated Electronics" in your DB. Reason for mentioning is that you will have to keep your queries focused on SINGLE WORDS only, instead of combining them in any particular order. (Same for inputs from user side – they might put words in any order!)

    In that context, the following might help:

    When taking more than one word as inputs (like via a HTML form), try to take those in TWO different fields which are clearly classified to the user as "MUST Contain Words" and "MAY Optionally Contain Words"

    In above example, user can give "Integrated Electronic" in MUST CONTAIN words field, and give "Halkias" as MAY CONTAIN words field.

    Then in your SQL Query, you can put a AND on all the words in the MUST field, and some combo of OR in the MAY field.

    where ((bookdet like '%Integrated%') AND (bookdet like '%Electronic%'))
    

    & rank the rows higher which also have "Halkias" in the field (using order by).

    With no changes possible on DB side, it will depend on the way you collect the details from the user & on the way you create your query.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search