skip to Main Content

I´m trying to use RegEx to find a website using this RegEx:

const RegEx = /h[w]+[://]+w+[.]+[w.=-]+[.]+[w]{2,3}[w/=-]+\/g

Websites = document.children[0].innerHTML.match(RegEx)

console.log(Websites)

//["https://www.linuxfoundation.org/cookies\","https://de.wikipedia.org\",...]

It should match "https://de.wikipedia.org",because of the \ at the end, where the first escapes the other.

But when I execute this, it only logs websites with "\" at the end.
It works onregex101.com but not in the browser.

It should find domains with at the end!
I tried to use new RegExp and [\] but both returned only \ at the end.

3

Answers


  1. I see it still runs OK

    -> Console result

    const RegEx = /h[w]+[://]+w+[.]+[w.=-]+[.]+[w]{2,3}[w/=-]+\/g
    
    const result = RegEx.exec('//["https://www.linuxfoundation.org/cookies\","https://de.wikipedia.org\",...]');
    
    console.log(result[0]);
    Login or Signup to reply.
  2. In your regular expression pattern, you’re using \ to match a single backslash . However, when you’re inspecting the matched results, JavaScript escapes it again, so you see \ in the output.

    If you want to match URLs with a single backslash at the end but see them as \ in the output, you can achieve this by modifying the regular expression pattern to include a single backslash \ and then examine the matched results. Here’s how you can do it:

    const RegEx = /h[w]+[://]+w+[.]+[w.=-]+[.]+[w]{2,3}[w/=-]+\/g
    
    const matchedUrls = document.children[0].innerHTML.match(RegEx)
    
    // Now, you can replace double backslashes with single backslashes
    const cleanedUrls = matchedUrls.map(url => url.replace(/\\/g, '\'));
    
    console.log(cleanedUrls);
    

    In this code, after you’ve matched the URLs, you use map to replace double backslashes \ with single backslashes . This way, you see the URLs with a single backslash in the output.

    Discussion:
    As suggested, please see the screenshot as this regex format ‘https?://(?:[a-z0-9-]{1,63}.)+(?:[a-z0-9-]{1,63})+’.

    enter image description here

    Login or Signup to reply.
  3. Your regular expression has many issues that need to be addressed before attempting to solve your problem.

    You have:

    const RegEx = /h[w]+[://]+w+[.]+[w.=-]+[.]+[w]{2,3}[w/=-]+\/g
    

    And it doesn’t make sense. For example [:://]. Square brackets define a set of characters that you want to match. There is no reason to have the same character more than once in the set, and the characters used do not need to be escaped. Your [://] is equal to [:/].

    I assume you want to match http:// or https://. The regular expression https?:// is used for that. The question mark makes the "s" optional.

    The next part I assume is used to match the domain name. w is a character class that matches a-z, A-Z, 0-9 and _ (underscore). Domain names cannot have underscores. Valid ASCII domain names can have a-z, A-Z, 0-9 and – (hyphens), and can be between 1 and 63 characters. The expression [a-zA-Z0-9-]{1,63} matches it.

    A website usually has one or more subdomains under the top-level domain. By placing the expression in a group, it can match multiple instances: (?:[a-zA-Z0-9-]{1,63}.)+[a-zA-Z0-9-]{1,63 } matches one or more subdomains followed by the top-level domain.

    We now have /https?://(?:[a-zA-Z0-9-]{1,63}.)+[a-zA-Z0-9-]{1,63}/ that matches the protocol (http or https) and the domain name.

    Now we can add your requirements. As I understand it from your examples, you want to match the string https://de.wikipedia.org. Then we add \ to the end of the expression (because backslashes must be escaped) resulting in /https?://(?:[a-zA-Z0-9-]{1,63}.)+[a-zA-Z0-9-]{1.63}\/

    You can test it bellow:

    const re = /https?://(?:[a-zA-Z0-9-]{1,63}.)+[a-zA-Z0-9-]{1,63}\/
    
    input.oninput = ()=> output.value = input.value.match(re)
    <input id="input">
    <output id="output">

    Note! The domain match is not complete. There are rules for where the hyphen can be placed, and there are rules for how many subdomains there can be. With international domain names, almost any character from any script can be used. The example provided does not attempt to implement a full domain validation and does not support IDNs unless they are in punycode.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search