Javascript - Search for smoking status in medical notes using regex

Expatchy
December 17, 2023
159 views
0 votes
2 Answers

I want to search for smoking status and it’s variation from medical notes text using regex by searching "smoking" and its variations.

But some of the notes contain "never smoke", "denies any smoking","Denies ever using tobacco" and such phrase. Which mean they don’t smoke.

How do I exclude this phrase from the regex?

Currently, my regex looks like this : b(?:smoking|smoker|smoked|cigarette(s)?|tobacco)b

It’s searching for those words perfectly, but as I mentioned before it also includes phrases like "denies smoking" which means the patient don’t smoke

Answers

- Kavelin
- December 16, 2023 at 11:13 pm
- 0 votes
0
You can use the same regex b(?:smoking|smoker|smoked|cigarette(s)?|tobacco)b and check to see if the same sentence have the words have ‘never’, ‘does not’ or ‘denies’ in the medical notes as well.

I’m not sure how big these notes are, and there might be a note such as ‘Doesn’t drink. Is a smoker’ or something along those lines. If they are written in sentences you may want to split the medical notes into individual sentences with notes.split('.') and then check in each sentence.

Even so, searching for words with regex is probably unreliable based on how much data you have. If you have notes that have a lot of variety in them then it would probably be better to use a machine learning api to go through each patient’s medical notes to extract the keywords that indicate that the patient is a smoker.

Login or Signup to reply.

- CarySwoveland
- December 17, 2023 at 1:43 am
- 0 votes
0
Suppose you wish to wish to identify the following strings, provided they are not part of one of the excluding strings that I list subsequently.
```
smoked
smoking
smoker
lit up
cigarette
cigarettes
```
The excluding strings are as follows.
```
never smoked
lit up a party
hated cigarettes
```
"smoked" is to be identified in the text, for example, unless it is part of the string "never smoked".

We can then match text with the following regular expression (with the case-indifferent flag set).
```
b(?:never smoked|lit up the party|hated cigarette|(smoked|smoker|lit up|cigarettes?))b
```
Notice that in the alternation the excluding strings come first. These strings are to be matched but not captured. They are followed by the strings to be identified, which are captured (to group 1) as well as matched.

Suppose the text were as follows.
```
Bob smoked but Lois never smoked.
    cccccc          mmmmmmmmmmmm 

Tim lit up his pipe and then lit up the party with his singing.        
    cccccc                   mmmmmmmmmmmmmmmm

Lulu left because she hated cigarette smoke.
                      mmmmmmmmmmmmmmm
```
As shown at regex101.com, strings underlined with c’s are both matched and captured; strings underlined with m’s are matched but not captured.

The key, of course, is to list the excluding strings before the strings to be identified.

The procedure is to (in code) skip over (disregard) matches that are not captured, keeping only matches that are captured.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – Search for smoking status in medical notes using regex

Answers