skip to Main Content

I’m trying to build a regex to catch a useful part of my S3 filename uploads. I used a regex generator and so far I have this test (which results in an error thrown on javascript):

/[A-Za-z]++[^.w][^.]++|(?<=_)w++(?=.)/g

Here are some example strings that I am working with (with the require pattern to match):

"MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf" // desired match "Bruntwood Leeds Digital Festival ad"

"bbZRU3329BfXXvvAWwP_short-video.mp4" // desired match "short-video"

"zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx" // desired match "MGI-Artificial-Intelligence-Discussion-slides"

If it helps – I need to run this regex test on javascript.

const filename = "bbZRU3329BfXXvvAWwP_short-video.mp4";
const match = filename.match(regex);
console.log(match); // "short-video"

Thank you!

4

Answers


  1. I used a regex generator

    But not for JavaScript regexes, it seems. Every tool and library has its own regex quirks. In particular, JS doesn’t support possessive quantifiers like ++ (nor independent submatches in general, (?> )).

    JS also does not support look-behind, (?<= ).

    You could e.g. do this instead:

    const strs = [
        "MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf",
        "bbZRU3329BfXXvvAWwP_short-video.mp4",
        "zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx",
    ];
    
    for (const str of strs) {
        const m = /_([^.]+)./.exec(str);
        if (!m) {
            console.log("no match: " + str);
            continue;
        }
        console.log("match: " + m[1]);
    }
    Login or Signup to reply.
  2. Given your examples, you could use a much simpler regex:

    const regex = /_([^.]+)/;
    
    const inputs = [
      "MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf", // desired match "Bruntwood Leeds Digital Festival ad"
      "bbZRU3329BfXXvvAWwP_short-video.mp4", // desired match "short-video"
      "zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx" // desired match "MGI-Artificial-Intelligence-Discussion-slides"
    ];
    
    for (const input of inputs) {
      const match = input.match(regex);
      console.log(match[1]);
    }
    Login or Signup to reply.
  3. Don’t use regex generators if they don’t provide your end regex flavor as flavors syntax and features may differ from each other. You are basically doing this:

    _[^.]+
    

    with the only one difference that it matches preceding _ character too that you can work around it later in JS.

    Live demo

    var text = `MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf
    bbZRU3329BfXXvvAWwP_short-video.mp4
    zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides`;
    
    console.log(
      text.match(/_[^.]+/g).map(v => v.substr(1))
    );
    Login or Signup to reply.
  4. For these example strings you could split on a dot and an underscore [._]

    That will give you an array with 3 parts. The values you are looking for are in the second part [1]:

    const strings = [
      "MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf",
      "bbZRU3329BfXXvvAWwP_short-video.mp4",
      "zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx"
    ];
    
    strings.forEach((s) => console.log(s.split(/[_.]/)[1]));
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search