skip to Main Content

I am in need to use the Word Boundary(y) as a condition to match the exact word which is been searched as a Query Param. I am writing this regex and executing in PostgreSQL but it is returning FALSE.

text_search_value = "apple." (Which needs to be exactly matched in the {doc_table} which is presented in the code)

My code

vector_split_text = re.sub(".|-|/", " ", text_search_value)
text_search_value = re.escape((re.sub(")",")",re.sub("(","(",text_search_value))))
converted_part_query = f""""{doc_table}".lni IN (SELECT BTRIM(xxx) FROM "{docsplit_table}" WHERE yyy IN ({all_rcis}) AND doc_text_vector @@ phraseto_tsquery('simple', '{vector_split_text}'))"""
               converted_part_query = (
                    "("
                    + converted_part_query
                    + f""" and "{doc_table}".document::text  ~ 'y{text_search_value}y'"""
                    + ")"
                )

PostgreSQL Query

select 'apple.' ~ 'yapple.y'

I might need help in fixing the regex to handle the exact search for the word "apple."

Expected Result
The Select Query should return TRUE while executing in the PostgreSQL Table with the help of regex fixed.

2

Answers


  1. I’m sorry, I might be confused, but if you’re using a Python regex command, y is not a word boundary. Instead, you’re using the literal "y" letter to match your word. You should use b for a word boundary. The correct regex syntax should look like the following:

    select ‘apple.’ ~ ‘bappleb’

    made with regex101

    Login or Signup to reply.
  2. The literal dot . doesn’t belong to the word character class:

    A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is any character belonging to the word character class, that is, any letter, digit, or underscore.

    That, combined with two meanings of y:

    y matches only at the beginning or end of a word

    And the note under the table this is from:

    A word is defined as in the specification of [[:<:]] and [[:>:]] above.

    Means your pattern expects another word to start at the second y.
    demo at db<>fiddle

    select 'apple.' ~ 'yapple.y';
    
    False
    select 'apple.plum' ~ 'yapple.y'
    
    True

    If you’re looking for an apple followed by a dot and not preceded or followed by any word characters, use 'yapple.Y'. Or a 'yappley' if you don’t really need the dot.

    I’m not sure what it is you’re trying to achieve exactly but if you’re trying to wrap user input, you might need to trim() non-word characters before adding an m on the front and an M at the end to avoid this, or switch between y and Y based on the leading/trailing character, if you don’t want to match anything with another word glued to it.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search