skip to Main Content

I’m trying to search and replace all URLs in the text. However, I have some words that I need to exclude from the search. For example (?!word)(?:https?://)?(?:[a-zA-Z0-9.]+).(?:[a-zA-Z]{2,3}). This regex found all urls but not exclude urls which contain word. Please help me finish this regex.

I’m tried:

const text = `This is a sample text with links: www.example.com, https://www.hello.com, http://www.google.com, https://word.com, http://word.com
www.word.com https://www.site.frs www.word.com asasd.cds qdwdew.www asd.cdd 
sdsd https://www.word.com
https://www.word.frs some text hello word https www.example.wfd asdasd wwwsds.dsad.word.com 
`;

const regex = /(?!word)(?:https?://)?(?:[a-zA-Z0-9.]+).(?:[a-zA-Z]{2,3})/gm;

const matches = text.replace(regex, '[xxx]');;

console.log(matches);

I got this result:

This is a sample text with links: [xxx], [xxx], [xxx], [xxx], [xxx]
[xxx] [xxx] [xxx] [xxx] [xxx] [xxx] 
sdsd [xxx]
[xxx] some text hello word https [xxx] asdasd [xxx] 

I want to get this result:

This is a sample text with links: [xxx], https://www.hello.com, [xxx], https://word.com, http://word.com
www.word.com [xxx] www.word.com [xxx] [xxx] [xxx] 
sdsd https://www.word.com
https://www.word.frs some text hello word https [xxx] asdasd wwwsds.dsad.word.com

2

Answers


  1. You could pass a callback function to String#replace instead, which checks if the match contains the given word to decide whether or not to replace.

    const text = `This is a sample text with links: www.example.com, https://www.hello.com, http://www.google.com, https://word.com, http://word.com
    www.word.com https://www.site.frs www.word.com asasd.cds qdwdew.www asd.cdd 
    sdsd https://www.word.com
    https://www.word.frs some text hello word https www.example.wfd asdasd wwwsds.dsad.word.com 
    `;
    
    const regex = /(?:https?://)?(?:[a-zA-Z0-9.]+).(?:[a-zA-Z]{2,3})/gm;
    
    const matches = text.replace(regex, m => m.includes('word') ? m : '[xxx]');
    console.log(matches);
    Login or Signup to reply.
  2. Arguably, the best way to do this is to specify a callback as replacement:

    const masked = text.replace(regex, $0 => $0.includes('word') ? $0 : '[xxx]');
    

    …or, if you want to avoid false positives like sword:

    const masked = text.replace(regex, $0 => /bwordb/.test($0) ? $0 : '[xxx]');
    

    Try it:

    console.config({ maximize: true });
    
    const text = `This is a sample text with links: www.example.com, https://www.hello.com, http://www.google.com, https://word.com, http://word.com
    www.word.com https://www.site.frs www.word.com asasd.cds qdwdew.www asd.cdd 
    sdsd https://www.word.com
    https://www.word.frs some text hello word https www.example.wfd asdasd wwwsds.dsad.word.com
    https://this.is.a.sword.com
    `;
    
    const regex = /(?:https?://)?(?:[a-zA-Z0-9.]+).(?:[a-zA-Z]{2,3})/g;
    
    const masked1 = text.replace(regex, $0 => $0.includes('word') ? $0 : '[xxx]');
    const masked2 = text.replace(regex, $0 => /bwordb/.test($0) ? $0 : '[xxx]');
    
    console.log(masked1);
    console.log(masked2);
    <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>

    Alternatively, here’s a pure regex solution:

    (?!S*bwordb)b(?:https?://)?(?:[a-zA-Z0-9.]+).(?:[a-zA-Z]{2,3})
    

    …in which (?!S*bwordb) ensures that the URL we are about to (try to) match will not contains word. b is needed so as not to match half the URL.

    Try it on regex101.com.

    Try it:

    console.config({ maximize: true });
    
    const text = `This is a sample text with links: www.example.com, https://www.hello.com, http://www.google.com, https://word.com, http://word.com
    www.word.com https://www.site.frs www.word.com asasd.cds qdwdew.www asd.cdd 
    sdsd https://www.word.com
    https://www.word.frs some text hello word https www.example.wfd asdasd wwwsds.dsad.word.com
    https://this.is.a.sword.com
    `;
    
    const regex = /(?!S*bwordb)b(?:https?://)?(?:[a-zA-Z0-9.]+).(?:[a-zA-Z]{2,3})/g;
    
    const masked = text.replace(regex, '[xxx]');
    
    console.log(masked);
    <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search