I’m converting a buffer of a PDF into a string to find the page count. Here is an example of the buffer string:
<<
/Type /Pages
/Count 2
/Kids[4 0 R 8 0 R]
>>
I see that page count is defined after /Type /Pages
and after /Count
. Here is a different example:
<<
/Type /Pages
/MediaBox [ 0 0 612 792 ]
/Count 22
>>
As you can see, the count is now defined after MediaBox
in between. I am trying to come up with a regex statement that can match the page count in both scenarios but I’m not sure where to start.
Would it make sense to match for 3 words in order then extract the number? Matching Type
first, then Pages
, then lastly Count
and extract the number after?
2
Answers
should do it, with capture group 1 holding the value of the count.
Demo(https://regex101.com/r/5dk3eP/1). Hover the cursor over each element of the regular expression to obtain an explanation of its function.
I would use a regex which looks for
/Type
and/Pages
between<<
and>>
and then captures theCount
when they are both found. This will then be completely independent of the order of the tags. You can do this with a tempered greedy token which prevents matching beyond>>
:For sample data of:
Group 1 will contain
[2, 22, 1, 1]
Demo on regex101. Note you need the
s
regex flag to make.
match newlines as well.