skip to Main Content

I have a string that contains text and some links. I want to remove all the links that do not have absolute links but keep their text.

I want to do it with regex instead of loading the dom.

Is that possible?

I have tried this regex

preg_replace('#href=["']([^/][^':"]*)["']#', $root_path.'$1', $html);

But this will replace the href with an empty string only.

And I want to completely remove the anchor tags and keep its text only.

2

Answers


  1. You can use this regex, but take it with a pinch of salt! It’s far from perfect, so try it for your use case and if anything goes wrong, just write in the comment and I’ll try to fix the regex for you!

    <?php
    
    $re = '/<a(s+[w-]+?(=(['"].*?['"])|([wd]+?))?)*s+href=['"](?!https?://)(?<link>[^'"]+?)['"](s+[w-]+?(=(['"].*?['"])|([wd]+?))?)*>(?<text>.+?)</a>/m';
    $str = '<a class="aaa" href="/example" data-enabled>Hello</a> <a href="https://www.example.com">world!</a>';
    
    $string = '<a href="/example">Hello</a> <a href="https://www.example.com">world!</a>';
    
    $out = preg_replace_callback($re,
     function($matches) {
        return $matches['text'];
     }, $string);
     
    var_dump($out);
    
    Login or Signup to reply.
  2. You can simplify your regex

    $text = "Lorem <a href='#'>text</a>
    Lorem2 <a href='#'>text2</a>";
    $text = preg_replace_callback('/<a .*>(.*)</a>/mi', function($matches) {
        return $matches[1];
    }, $text);
    
    var_dump($text);
    

    Output will be like that

    string(24) "Lorem text
    Lorem2 text2"
    

    Run in sandbox:

    https://onlinephp.io/c/3bda8

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search