I have an identifier that is made of at least one of three parts of various lengths. The first part are capital letters, which can be matched by [A-Z]+
. The second part are small letters which can be matched by [a-z]+
. The third part are digits which can be matched by [0-9]+
. They must occur in this specific order. So all the following are correct identifiers: A
, h
, 3
, Ab
, AA9
, abc23
, Hyy2346
. They should be matched by the pattern. Examples of incorrect identifiers include: wD
, 99A
, 6a5
, etc. or anything that contains non-alphanumerical characters.
The problem is that if I do for example [A-Z]*[a-z]*[0-9]*
such pattern also matches an empty string which is not a correct identifier. How to make a pattern so that the length of the matched string is at least 1?
This is a simplified version of the more complicated problem I am facing in which I have to match entries in certain ancient scripts. I used [A-Z]+
etc. above as examples to illustrate my sub-patterns, but the actual sub-patterns I have are much more complicated. Nevertheless they are all of the form (a class of characters and words)+
. Also: I am doing this in JavaScript. Also: I need to do this in a single regex.
I could try something like ([A-Z]+[a-z]*[0-9]*|[A-Z]*[a-z]+[0-9]*|[A-Z]*[a-z]*[0-9]+)
but because my actual sub-patterns are much longer than those [A-Z]+
etc., the result is very long and I am wondering if there is a way to make it more efficient.
3
Answers
Add a positive lookahead that matches one character at the beginning.
Or just check that the string isn’t empty before testing it with the regexp.
One way could be to make a positive lookahead to check that the string contains at least one character:
(?=.)
–(?=
positive lookahead for.
(any character)