I need to implement a simple text classification using regex, and for this I though to apply a simple CASE WHEN statement, but rather than in case 1 condition is met, I want to iterate over all the CASEs
for example
with `table` as(
SELECT 'It is undeniable that AI will change the landscape of the future. There is a frequent increase in the demand for AI-related jobs, especially in data science and machine learning positions. It is believed that artificial intelligence will change the world, just like how electricity changed the world about 100 years ago. As Professor Andrew NG has famously stated multiple times “Artificial Intelligence is the new electricity.” We have advanced immensely in the field of artificial intelligence. With the increase in the processing and computational power, thanks to graphical processing units (GPUs), and also due to the abundance of data, we have reached a position of supremacy in Deep Learning and modern algorithms.' as text
)
SELECT
CASE
WHEN REGEXP_CONTAINS(text, r'(?i)ai') THEN 'AI'
WHEN REGEXP_CONTAINS(text, r'(?i)computational power') THEN 'Engineering'
WHEN REGEXP_CONTAINS(text, r'(?i)deep learning') THEN 'Deep Learning'
END as topic,
text
FROM `table`
with this query, the text is classified as AI because is the first condition that is met, but it should be classified as AI, Engineering and Deep Learning in an Array or in 3 different rows, because all 3 conditions are met.
How can I classify the text applying all the regex/conditions?
3
Answers
One method is string concatenation:
Actually, this constructs a string. You can use similar-ish logic to construct an array instead.
Below is for BigQuery Standard SQL
if to apply to sample data from your question – output is
I feel below is most generic and reusable solution (BigQuery Standard SQL)
with output