Given some urls inside an html page, I want to replace some urls as below:
Example url: https://example.com/cost-center/sub-one/article1
From that one I want to replace the text that is between /cost-center/
and the last part of the url (article1
) with the another text (test
)
That means the above url would be converted to: https://example.com/cost-center/test/article1
.
In my case there can be more parts after /cost-center/
and the url can end with a slash or can be inside quotes as in the examples below:
https://example.com/cost-center/sub-one/sub-two/article-3/
https://example.com/cost-center/sub-one/sub-three/article-4
https://example.com/cost-center/sub-1/sub-two/sub-three/article-5/
'https://example.com/cost-center/sub-one/sub-two/article-3/'
'https://example.com/cost-center/sub-1/sub-two/sub-three/article-5'
"https://example.com/cost-center/sub-one/sub-three/article-4"
"https://example.com/cost-center/sub-1/sub-two/sub-three/article-5/"
those would be replaced as below:
https://example.com/cost-center/test/article-3/
https://example.com/cost-center/test/article-4
https://example.com/cost-center/test/article-5/
'https://example.com/cost-center/test/article-3/'
'https://example.com/cost-center/test/article-5'
"https://example.com/cost-center/test/article-4"
"https://example.com/cost-center/test/article-5/"
For now let’s assume that the url will have at least one and at most three parts after the /cost-center/
;
e.g. https://example.com/cost-center/sub-1/sub-two/sub-three/article-5/
So basically I want to replace some parts of it while keeping the last part.
I have trying using a numerours regexes such as:
preg_replace('~https://example.com/cost-center/[^/]+/([^/]+)~', 'https://example.com/cost-center/test/$1', $url);
preg_replace('/(["']?)(https://[^/]+/)([^/]+)(/[^"s]*)?/', '$1$2test$4$1', $url);
I have also tried using splitting the url with explode
and parsing it manually piece by piece, but it turned out very complicated and ugly.
The ChatGPT
also has no good results.
2
Answers
I tried the following:
The regex capture 3 groups:
/cost-center/
/cost-center/
and the last part of the urlIn the replacement string, you keep the fist and third group
($1 and $3)
and replace the second group with test/. It will replace the text between/const-center/
and the last part of the url withtest/
EDIT: I modified the regex to include the quotes and the attribute name for cases where the URL is inside an HTML attribute such as href
From your description of the task and your sample data, it doesn’t actually matter if/how a URL is quote-wrapped. You merely need to match the leading part of the URL to validate that it IS a URL, then isolate the unwanted substring and replace it.
Notice that my replacement value is only the string
test
and no references to capture groups. This is becauseK
will forget/release all characters matched up to that point, and(?= ... )
is a lookahead meaning that it doesn’t consume any of the matched characters.As for isolating the portion of the pattern to be replaced, I am using a negated character class contain forward slashes and whitespaces then a literal forward slash. That subpattern is may repeat one ore more times (because of the
+
quantifier).Code: (Demo)