skip to Main Content

Given some urls inside an html page, I want to replace some urls as below:

Example url: https://example.com/cost-center/sub-one/article1
From that one I want to replace the text that is between /cost-center/ and the last part of the url (article1) with the another text (test)

That means the above url would be converted to: https://example.com/cost-center/test/article1.

In my case there can be more parts after /cost-center/ and the url can end with a slash or can be inside quotes as in the examples below:

https://example.com/cost-center/sub-one/sub-two/article-3/
https://example.com/cost-center/sub-one/sub-three/article-4
https://example.com/cost-center/sub-1/sub-two/sub-three/article-5/
'https://example.com/cost-center/sub-one/sub-two/article-3/'
'https://example.com/cost-center/sub-1/sub-two/sub-three/article-5'
"https://example.com/cost-center/sub-one/sub-three/article-4"
"https://example.com/cost-center/sub-1/sub-two/sub-three/article-5/"

those would be replaced as below:

https://example.com/cost-center/test/article-3/
https://example.com/cost-center/test/article-4
https://example.com/cost-center/test/article-5/
'https://example.com/cost-center/test/article-3/'
'https://example.com/cost-center/test/article-5'
"https://example.com/cost-center/test/article-4"
"https://example.com/cost-center/test/article-5/"

For now let’s assume that the url will have at least one and at most three parts after the /cost-center/;

e.g. https://example.com/cost-center/sub-1/sub-two/sub-three/article-5/

So basically I want to replace some parts of it while keeping the last part.

I have trying using a numerours regexes such as:

preg_replace('~https://example.com/cost-center/[^/]+/([^/]+)~', 'https://example.com/cost-center/test/$1', $url);

preg_replace('/(["']?)(https://[^/]+/)([^/]+)(/[^"s]*)?/', '$1$2test$4$1', $url);

I have also tried using splitting the url with explode and parsing it manually piece by piece, but it turned out very complicated and ugly.

The ChatGPT also has no good results.

2

Answers


  1. I tried the following:
    The regex capture 3 groups:

    1. The beginning of the url up to /cost-center/
    2. Everything between /cost-center/ and the last part of the url
    3. The last part of the url which can end with a slash or not
    $pattern = '/(https://example.com/cost-center/)(.*?)([^/]+/?$)/';
    $replacement = '$1test/$3';
    $result = preg_replace($pattern, $replacement, $url);
    

    In the replacement string, you keep the fist and third group ($1 and $3) and replace the second group with test/. It will replace the text between /const-center/ and the last part of the url with test/

    EDIT: I modified the regex to include the quotes and the attribute name for cases where the URL is inside an HTML attribute such as href

    $pattern = '/(href=["']https://example.com/cost-center/)(.*?)([^/]+/?["'])/';
    
    Login or Signup to reply.
  2. From your description of the task and your sample data, it doesn’t actually matter if/how a URL is quote-wrapped. You merely need to match the leading part of the URL to validate that it IS a URL, then isolate the unwanted substring and replace it.

    Notice that my replacement value is only the string test and no references to capture groups. This is because K will forget/release all characters matched up to that point, and (?= ... ) is a lookahead meaning that it doesn’t consume any of the matched characters.

    As for isolating the portion of the pattern to be replaced, I am using a negated character class contain forward slashes and whitespaces then a literal forward slash. That subpattern is may repeat one ore more times (because of the + quantifier).

    Code: (Demo)

    echo preg_replace('#https://[^/]+/cost-center/K([^/s]+/)+(?=article)#', 'test/', $text);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search