skip to Main Content

I’m working with strings. The string often contains something like this:

"The site https://example.com contains all the information you need"

So far I have this regex, which splits the string when https? is found

const rgx_link = /(?=https?://)/gi;

So in this case it would split the string into:

arr = ["The site ", "https://example.com contains all the information you need"];

I would like to modify the regex, so it also splits the string if a space is found after the url.

The desired result would look like this:

arr = ["The site ", "https://example.com", " contains all the information you need"];

The new regex "should look" something like this ((?=https?://)(s)), but it doesn’t work.

Any help would be greatly appreciated. Thank you.

const text = "The site https://example.com contains all the information you need";
const rgx_link = /(?=https?://)/gi;
const result = text.split(rgx_link);
console.log(result)

Edit 1: Wiktors suggestion is correct. Didn’t notice that the leading part of the regex was changed, so it didn’t work.

const text = "The site https://example.com contains all the information you need";
const rgx_link = /(https?://S*)/gi;
const result = text.split(rgx_link);
console.log(result)

3

Answers


  1. const text =
        "The site https://example.com contains all the information you need";
    const rgx_link = /(?=https?://)/gi;
    const result = text.split(rgx_link);
    
    const url = result[1].split(" ")[0];
    const rest = result[1].split(url + " ")[1];
    
    const res = [result[0], url, rest];
    
    console.log(res);
    
    

    Is this what you need?

    Login or Signup to reply.
  2. You can use

    text.split(/(https?://S*)/i)
    

    Note that String#split method outputs all captured substrings when the regex contains capturing groups, so this regex, being entirely wrapped with a capturing group, basically tokenizes the string into URLs and non-URL tokens.

    Note the absence of the g flag, since String#split behavior by default is to split on all occurrences of the regex pattern.

    Pattern details

    • http – a http string
    • s? – an optional s char
    • :// – a :// substring
    • S* – zero or more non-whitespace chars.
    Login or Signup to reply.
  3. in python I can search for an url then use the span position and use the end value to split the string after the url

    data="""The site https://example.com contains all the information you need"""
    
    def find_url(text):
        return re.search(r'https?://[^s]*', text)
    
    position=int(find_url(data).span()[1])
    
    print(position)
    
    print(data[0:position],data[position:-1])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search