skip to Main Content

I have the following string

https://test.io/content/storage/id/urn:aaid:sc:US:8eda16d4-baba-4c90-84ca-0f4c215358a1;revision=0?component_id=e62a5567-066d-452a-b147-19d909396132 

I need to use regex to get the following string

urn:aaid:sc:US:8eda16d4-baba-4c90-84ca-0f4c215358a1

from the url above.

The string will always begin with a urn and end with a letter or number.
Can someone please help? How can I do this using typescript?

Here’s what I tried but it gave me an null value. Thanks!

    function extractAssetIdFromUrl(url: string) {
        // Regular expression to match the desired pattern
        const regex = /urn[w-]+/;
        
        // Use the regex to find the match in the URL
        const match = url.match(regex);
    
        // Check if there is a match and return it, otherwise return null
        return match ? match[0] : null;
    }

3

Answers


  1. First extract the last path segment. Then you can match it against your required characters without having to worry about the query string or any other part of the URL

    function extractAssetIdFromUrl(url) {
      // as per requirements
      // "The string will always begin with a urn and end with a letter or number"
      const rx = /^urn[a-z0-9:-]*[a-z0-9]/i;
      
      const baseName = new URL(url).pathname.split('/').at(-1);
      return baseName.match(rx)?.[0] ?? null;
    }
    
    console.log(
      extractAssetIdFromUrl(
        "https://test.io/content/storage/id/urn:aaid:sc:US:8eda16d4-baba-4c90-84ca-0f4c215358a1;revision=0?component_id=e62a5567-066d-452a-b147-19d909396132",
      ),
    );
    
    console.log(
      extractAssetIdFromUrl(
        "https://example.com/foo/bar/baz",
      ),
    );
    Login or Signup to reply.
  2. It looks like your id is terminated with a ;, so I’d assume a regex for "starts with urn:", and "then everything that isn’t ;" should work just fine:

    const  url = ` https://test.io/content/storage/id/urn:aaid:sc:US:8eda16d4-baba-4c90-84ca-0f4c215358a1;revision=0?component_id=e62a5567-066d-452a-b147-19d909396132`;
    
    const match = url.match(//(urn:[^;]+);/);
    if (match) console.log(match[1]);

    And then, of course,

    Login or Signup to reply.
  3. https://regexr.com is very useful for understanding and deconstructing regex patterns.

    This regex will do what you want based on the example pattern provided:

    /urn:(?:[^:]+:)*[a-z0-9-]+/ig
    

    To use this pattern to extract it from the example URL string you provided, you could do this:

    const url = 'https://test.io/content/storage/id/urn:aaid:sc:US:8eda16d4-baba-4c90-84ca-0f4c215358a1;revision=0?component_id=e62a5567-066d-452a-b147-19d909396132';
    const urnId = /urn:(?:[^:]+:)*[a-z0-9-]+/ig.exec(url)[0];
    // = 'urn:aaid:sc:US:8eda16d4-baba-4c90-84ca-0f4c215358a1'
    

    As others have said, though, I would recommend first extracting the path out of the URL, as URLs can have things like query strings, which could also contain a pattern that matches this, resulting in possible unintended results.

    Also, other have mentioned the fact that there appears to be a semicolon delimiter used in the URL, which could make your work much easier if that will reliably exist in the URL (and possibly even avoid the need for regex).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search