Javascript - Regex match three words in order (there can be words in between)

RJK
September 19, 2023
191 views
0 votes
2 Answers

I’m converting a buffer of a PDF into a string to find the page count. Here is an example of the buffer string:

<<
/Type /Pages
/Count 2
/Kids[4 0 R 8 0 R]
>>

I see that page count is defined after /Type /Pages and after /Count. Here is a different example:

<<
/Type /Pages
/MediaBox [ 0 0 612 792 ]
/Count 22
>>

As you can see, the count is now defined after MediaBox in between. I am trying to come up with a regex statement that can match the page count in both scenarios but I’m not sure where to start.
Would it make sense to match for 3 words in order then extract the number? Matching Type first, then Pages, then lastly Count and extract the number after?

Tags: javascript regex

Answers

- CarySwoveland
- September 13, 2023 at 3:03 am
- 0 votes
0
```
//Type /Pagesb.*?^/Count (d+)/ms
```
should do it, with capture group 1 holding the value of the count.

Demo(https://regex101.com/r/5dk3eP/1). Hover the cursor over each element of the regular expression to obtain an explanation of its function.
Login or Signup to reply.

- Nick
- September 13, 2023 at 4:06 am
- 0 votes
0
I would use a regex which looks for /Type and /Pages between << and >> and then captures the Count when they are both found. This will then be completely independent of the order of the tags. You can do this with a tempered greedy token which prevents matching beyond >>:
```
<<(?=(?:(?!>>).)*/Type)(?=(?:(?!>>).)*/Pages)(?:(?!>>).)*/Counts+(d+)
```
For sample data of:
```
<<
/Type /Pages
/Count 2
/Kids[4 0 R 8 0 R]
>>
<<
/Type /Pages
/MediaBox [ 0 0 612 792 ]
/Count 22
>>
<<
/Type /Image
/MediaBox [ 0 0 612 792 ]
/Count 1
>>
1 0 obj <</Type /Pages /Kids [3 0 R ] /Count 1 >>
2 0 obj <<   /Count 1   /Kids [ 5 0 R ]   /Type /Pages >>
```
Group 1 will contain [2, 22, 1, 1]

Demo on regex101. Note you need the s regex flag to make . match newlines as well.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – Regex match three words in order (there can be words in between)

Answers