skip to Main Content

I’m trying to extract only a certain portion of a string URL field in my query. I am doing this in Amazon Redshift. The query is very simple, essentially

SELECT email_id, url
FROM emailsends
WHERE date > '01-01-2023'

Pretty simple overall. In the URL, every one is basically structured as so:

https://www.example.com/489r9759t9t897987&utm_source=summer-mailer&utm_medium=email&utm_campaign=summer-sale

What I want from the URL is to extract utm_source and utm_campaign. I’d like them to be their own fields, so my output would be something like this:

email_id url utm_source utm_campaign
1234 google.com/dmecm summer-mailer summer-sale

Is there a simple way to do this within the query? Substring using positions of text doesn’t really work because all URLs are of different length.I’d kind of need to start with &utm_source and end it after the next & where utm_medium begins, for instance. I’m just unsure how to properly extract and parse.

I have tried using the Substring function and different kinds of REGEX ie REGEX_SUBSTR. I am using DBVisualizer, which can be a bit limiting. I’d like to do this sort of simple and not have to create a temp table or something.

2

Answers


  1. In MySQL 8.x you can use REGEXP_SUBSTR() to get the entire parameter, and then SUBSTRING_INDEX() to extract the part after =.

    SUBSTRING_INDEX(REGEXP_SUBSTR(url, 'utm_source=[^&]+'), '=', -1)
    
    Login or Signup to reply.
  2. You can also write it as simple select query like this:

    SELECT
        email_id,
        url,
        SUBSTRING(url, 'utm_source=([^&]*?|$)') AS utm_source,
        SUBSTRING(url, 'utm_campaign=([^&]*?|$)') AS utm_campaign
    FROM
        emailsends;
    

    The SUBSTRING function also accepts regular expressions.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search