I have a bunch of product titles, from which I need to extract the SKU that’s within them.
So take the following titles as an example:
- 258 Game of Thrones
- E457 Pokemon
- 293A Wool Bed cover
- 572 C Steel frame whatever
So in the above examples, the SKUs are 258
, E457
, 293A
and 572 C
respectively.
Generally, the SKU is either all numbers (mainly 3 or 4 characters long), or letter E followed again by 3-4 numbers, or a 3-4 digit number, followed by a single letter, or by a single space and a single letter.
So I came up with this pattern that seems to work well in identifying all of the above cases: /^E?d+ ?.?/
https://regex101.com/r/I7kkDP/2
Then, there are some totally messed up titles, which have the SKU somewhere in the middle… From what I saw, these cases are rare, and when they happen it’s only numbers, so no starting E, or ending single letter. Two examples of this are the following:
- Decorative pillow / Set with bed covers
2456
55Χ55cm - Pillow
207
45 Χ 65 cm
Fortunately, the SKU in these rare cases is the first whole number that’s met in the title.
So, what I need is preg_replace
to fix the above totally messed up titles, so that then my pattern can correctly extract the SKU.
Thank you very much in advance.
2
Answers
Use word-boundaries (
b
) to delineate the SKU from any other characters, then just check for your defining characters.The boundaries will ensure you don’t falsely match
258 G
from258 Game of Thrones
.https://3v4l.org/rCEqD
I can answer the RegEx pattern part:
(E?d{3,4} ?[A-Z]?(?=s))
, tested at https://regex101.com with the following text block: