skip to Main Content

I’m building a web crawler which collects crawled results into MySQL table.

There are Five Main Columns: URL, TITLE, DESCRIPTION, KEYWORDS, BODY.

Currently I’m using FULLTEXT search function of MySQL as follows:

SELECT URL,title, description, MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) 
AS score FROM record
WHERE MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) order by score desc;";

But it is not giving me good results. Consider the following image.
enter image description here

Here, the facebook is at 23rd position on searching "Facebook".(?)

Can i prioritize the search based upon the coloumn name? For example, I want the query to give maximum priority to URL, then description, then title, keywords.. and finally body.

Any suggestions ?

2

Answers


  1. Look at something like SoundEx:

    See: http://www.madirish.net/?article=85

    Additionally could you not considering doing the weighting yourself: (I don’t have MySQL local, so sorry for the semi-pseudocode)

    SELECT 
        URL
        ,title
        , description
        , MATCH (URL) AGAINST ('$keyword' in boolean mode) AS urlscore 
        , MATCH (description) AGAINST ('$keyword' in boolean mode) AS descscore 
        , MATCH (title) AGAINST ('$keyword' in boolean mode) AS titlescore 
        , MATCH (body) AGAINST ('$keyword' in boolean mode) AS bodyscore 
    
        ,((MATCH (URL) AGAINST ('$keyword' in boolean mode))*4) 
        + ((MATCH (description) AGAINST ('$keyword' in boolean mode))*3) 
        + ((MATCH (title) AGAINST ('$keyword' in boolean mode))*2) 
        + ((MATCH (body) AGAINST ('$keyword' in boolean mode))*1)  as weightedscore
    FROM    
        record
    WHERE 
        MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) 
    order by 
        ((MATCH (URL) AGAINST ('$keyword' in boolean mode))*4) 
        + ((MATCH (description) AGAINST ('$keyword' in boolean mode))*3) 
        + ((MATCH (title) AGAINST ('$keyword' in boolean mode))*2) 
        + ((MATCH (body) AGAINST ('$keyword' in boolean mode))*1)  desc;";
    
    Login or Signup to reply.
  2. SELECT URL,title, description, MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) AS score FROM record WHERE URL LIKE '%$keyword%' OR MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) order by score desc;";
    

    Just use LIKE operator for URL matching. See above code. Thank U!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search