first time asking a question on Stackoverflow and I hope to get some help here:
Im working on a big database and only got basic skills in SQL (using postgresql).
I work with the german ICD10 classification and the so called alphabetical index usually contains a lot of synonyms. An example for code i11.90:
Benigne hypertensive Herzkrankheit
Benigne Hypertonie bei Herzkrankheit
Bluthochdruck mit Herzbeteiligung
Cor hypertonicum
HHK [Hypertensive Herzkrankheit]
Hochdruck mit Herzbeteiligung
Hochdruck mit Herzkrankheit
Hypertensive Herzhypertrophie
Hypertensive Herzkrankheit
Hypertensive Kardiomyopathie
Hypertensive Myokarddegeneration
Hypertone Herzkrankheit
Hypertonie mit Altersherz
Hypertonie mit Herzbeteiligung
Hypertonieherz
Kardiovaskuläre Hypertonie
Maligne hypertensive Herzkrankheit
Maligne Hypertonie mit Herzbeteiligung
I try to come up with a SQL query which cheks if the isolated string within the [] in this case Hypertensive Herzkrankheit is already existing as separate entry within the list of synonyms.
In this example as u can see the Hypetensive Herzkrankheit within the brackets is already existing as separate entry.
I already asked chatGPT whether it could help but either my prompt engineering skills are not good enough or chatGPT is not able to come up with the solution.
It gave me this first solution:
SELECT *
FROM your_table
WHERE REGEXP_SUBSTR(your_column, '\[([^\]]+)\]') IS NOT NULL;
which does not work and tbh I dont understand the part after REGEXP_SUBSTR
After I specified that I use postgresql it gave me this solution
SELECT *
FROM your_table
WHERE your_column ~ '.*\[([^\]]+)\]';
but this is not solving my problem since its just the basic regex which I am familiar with, not checking whether a substring is already existing as separate entry withing the table.
Would be great to get some help on that – many thanks in advance.
2
Answers
I came up with following solution for my problem - perhaps there might be even a cleaner way to write the query with an INNER JOIN but this is at least somehow solving my problem:
Just asking ChatGPT to solve it for you is not great evidence of research before asking.
Do you want every synonym with [] which also occurs as a separate entry? Or just to find the answer for a particular synonym, one at a time?
If the former, how about generating a view with only the synonyms, and a view with any synonyms removed* (both using fairly standard string searching and manipulation), and joining them?
*I’m suggesting this second view (rather than just joining the first view to the raw table) because otherwise, if the synonym itself has a synonym you might miss the match.
Also, are you sure each entry has at most one synonym? If potentially more than one then it is still possible, but more fiddly (as splitting each row with synonyms into potentially multiple rows, as you would need to, is possible but fiddly).