I’m trying to extract only a certain portion of a string URL field in my query. I am doing this in Amazon Redshift. The query is very simple, essentially
SELECT email_id, url
FROM emailsends
WHERE date > '01-01-2023'
Pretty simple overall. In the URL, every one is basically structured as so:
https://www.example.com/489r9759t9t897987&utm_source=summer-mailer&utm_medium=email&utm_campaign=summer-sale
What I want from the URL is to extract utm_source and utm_campaign. I’d like them to be their own fields, so my output would be something like this:
email_id | url | utm_source | utm_campaign |
---|---|---|---|
1234 | google.com/dmecm | summer-mailer | summer-sale |
Is there a simple way to do this within the query? Substring using positions of text doesn’t really work because all URLs are of different length.I’d kind of need to start with &utm_source and end it after the next & where utm_medium begins, for instance. I’m just unsure how to properly extract and parse.
I have tried using the Substring function and different kinds of REGEX ie REGEX_SUBSTR. I am using DBVisualizer, which can be a bit limiting. I’d like to do this sort of simple and not have to create a temp table or something.
2
Answers
In MySQL 8.x you can use
REGEXP_SUBSTR()
to get the entire parameter, and thenSUBSTRING_INDEX()
to extract the part after=
.You can also write it as simple select query like this:
The
SUBSTRING
function also accepts regular expressions.