I need to get the second part of the URI, the possible URI are:
/api/application/v1/method
/web/application/v1/method
I can get "application"
using:
([^/api]w*)
and
([^/web]w*)
But I know is not the best approach, what would be the good way?
Thanks!
Edit: thank you all for the input, the goal was to set the second parte of the uri into a header in apache with rewrite rules
4
Answers
A general regex (Perl or PCRE syntax) solution would be:
Each section is delimited with
/
, so just capture as many non-/
characters as there are.This is preferable to non-greedy regexes because it does not need to backtrack, and allows for whatever else the sections may contain, which can easily contain non-word characters such as
-
that won’t be matched byw
.Your pattern
([^/api]w*)
consists of a capturing group and a negated character class that will first match 1 time not a/
,a
,p
ori
. See demo.After that 0+ times a word char will be matched. The pattern could for example only match a single char which is not listed in the character class.
What you might do is use a capturing group and match
w+
Explanation
^
Start of string(?:api|web)
Non capturing group with alternation. Match either api or web(w+)
Capturing group 1, match 1+ word chars/v1/method
Match literally as in your example data.Regex demo
There are so many options that we can do so, not sure which one would be best, but it could be as simple as:
which our desired output is in the second capturing group
$2
.Demo 1
Example
Output
Advice
zdim advises that:
With all the regex, explicitly asked for, I’d like to bring up other approaches.
These also parse only a (URI style) path, like the regex ones, and return the second directory.
The most basic and efficient one, just split the string on
/
The
split
returns''
first (before the first/
) thus we need the third element. (Note that we can use an alternate delimiter for the separator pattern, it being regex:split m{/}, $path
.)Use appropriate modules, for example URI
or Mojo::Path
What to use depends on details of what you do — if you’ve got any other work with URLs and web then you clearly want modules for that; otherwise they may (or may not) be an overkill.
I’ve benchmarked these for a sanity check of what one is paying with modules.
The
split
either beats regex by up to10-15%
(the regex using negated character class and the one based on non-greedy.+?
come around the same), or is about the same with them. They are faster thanMojo
by about30%
, and onlyURI
lags seriously, by a factor of 5 behindMojo
.That’s for paths typical for real-life URLs, with a handful of short components. With only two very long strings (10k chars),
Mojo::Path
(surprisingly for me) is a factor of six ahead ofsplit
(!), which is ahead of character-class regex by more than an order of magnitude.The negated-character-class regex for such long strings beats the non-greedy (
.+?
) one by a factor of 3, good to know in its own right.In all this the URI and Mojo objects were created once, ahead of time.
Benchmark code. I’d like to note that the details of these timings are far less important than the structure and quality of code.
With a (10-second) run this prints, on a laptop with v5.16
One should keep in mind that the overhead of the function-call is very large for such a simple job, and in spite of
Benchmark
‘s work these numbers are probably best taken as a cursory guide.