Php - Why do I not get a match using preg_match

objelland
August 18, 2023
272 views
0 votes
2 Answers

I’m trying to extract the values Navn, Telefon, E-postadresse and Adresse from a text file. The structure in the text file comes from a converted pdf file, thus the blank lines between the lines with text. I’m using php preg_match_all to extract the values. The loop only extracts the first value which is the Navn value. I don’t get a hit on the rest. Can anyone point me in the right direction on this?
Here is the text file content:

    Kontaktperson:

    Navn: Johan Wathne

    Telefon: 99566530

    99566530

    E-postadresse: johancqöhwheno

My PHP code looks like this where the variable $fileContent contains the text above:

if (preg_match('/Kontaktperson:s*(.*?)nn/ms', $fileContent, $matches)) {
    $kontaktpersonSection = trim($matches[1]);

    // Extract Tiltakshaver fields
    if (preg_match_all('/(Navn|Telefon|E-postadresse|Adresse):s*([^n]+)s*(?:nn|$)/', $kontaktpersonSection, $kontaktMatches, PREG_SET_ORDER)) {
        $kontaktpersonInfo = "Kontaktpersonn";
        $currentkontakt = "";
        foreach ($kontaktMatches as $kontaktMatch) {
            if ($kontaktMatch[1] === "Navn") {
                $currentkontakt = "Navn";
                $kontaktValue = $kontaktMatch[2];
                $kontaktpersonInfo .= "$currentkontakt: $kontaktValuen";
            } elseif ($kontaktMatch[1] === "Telefon") {
                $currentkontakt = "Telefon";
                $kontaktValue = explode("n", $kontaktMatch[2])[0];
                $kontaktpersonInfo .= "$currentkontakt: $kontaktValuen";
            } elseif ($kontaktMatch[1] === "E-postadresse") {
                $currentkontakt = "E-postadresse";
                $kontaktValue = $kontaktMatch[2];
                $kontaktpersonInfo .= "$currentkontakt: $kontaktValuen";
            } elseif ($kontaktMatch[1] === "Adresse") {
                $currentkontakt = "Adresse";
                $kontaktValue = $kontaktMatch[2];
                $kontaktpersonInfo .= "$currentkontakt: $kontaktValuen";
            }
        }
        $extractedInfo .= $kontaktpersonInfo;
    }
}

Tags: php

Answers

- PatrickJanser
- August 15, 2023 at 3:53 pm
- 0 votes
0
As I mentioned in a comment, your first regular expression isn’t
capturing all what you want, leading to missing fields below.

I would change your regular expression to this:
```
/
# Opening of contact person:
Kontaktperson:
s*   # Ignore spaces
(.*?) # Capture all the fields.
# It should end with one of these:
(?: # Non-capturing group
  $ # End of file
  | # or
  n{3} # 3 new lines
  | # or
  n(?=Kontaktperson:) # A new line followed by Kontaktperson:
)
/gsx
```
See it in action with the help: https://regex101.com/r/YJMYhJ/1

I don’t know what helps you find the end of the contact person section
but I assume it could be:
- The end of the file : $ (don’t use the m flag if not $ matches
  the end of a line)
- At least 3 new lines (as you already had 2 lines between fields).
- A new line followed by a new opening contact person section. Here I
  use a positive lookahead to avoid "eating" the Kontaktperson: text
  so that we can match the next occurences.
I used the x flag so that you can put comments in your pattern.
Login or Signup to reply.

- fatpenguin
- August 18, 2023 at 5:50 am
- 0 votes
0
You don’t need to use a regex if your input is well structured.

You can use

$blob = explode("n", $fileContent);

to break $fileContent into an array, then use strpos() and substr() to extract everything to the right of the colon for the appropriate lines.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Php – Why do I not get a match using preg_match

Answers