skip to Main Content

I have this regex to match image src tags and also alt or title tags, but it works only if the src is first, how should I modify it to match these 3 in any order ?
Or are there more accurate ways to do this by parsing html elements ? I assume by regex I might get back array elements but without knowing which is what.

For example now it matches:

img src="landscape.jpg" title="My landscape"

but not

img title="My landscape" src="landscape.jpg" 

current regex is:

preg_match_all('#<imgs+[^>]*src="([^"]*)"(?:s+[^>]*(?:alt|title)="([^"]+)")?[^>]*>#is', $url_contents, $image_matches);

2

Answers


  1. Chosen as BEST ANSWER

    I found a simple example using DOMDocument: It does just what I wanted and seems way more reliable than what I tried by regex.

    <?php
    
    $dom = new DOMDocument();
          
    // Loading HTML content in $dom
    @$dom->loadHTMLFile($url);
      
    // Selecting all image i.e. img tag object
    $anchors = $dom -> getElementsByTagName('img');
      
    // Extracting attribute from each object
    foreach ($anchors as $element) {
          
        // Extracting value of src attribute of
        // the current image object
        $src = $element -> getAttribute('src');
          
        // Extracting value of alt attribute of
        // the current image object
        $alt = $element -> getAttribute('alt');
          
        // Extracting value of height attribute
        // of the current image object
        $height = $element -> getAttribute('height');
          
        // Extracting value of width attribute of
        // the current image object
        $width = $element -> getAttribute('width');
          
        // Given Output as image with extracted attribute,
        // you can print value of those attributes also
        echo '<img src="'.$src.'" alt="'.$alt.'" height="'. $height.'" width="'.$width.'"/>';
    }
        
    ?>
    

  2. You could use:

    (?<=<img)(?: (src|title|alt)="([^"]+)")?(?: (src|title|alt)="([^"]+)")?(?: (src|title|alt)="([^"]+)")?
    
    • (?<=<img) – behind me is an <img start tag
    • (?: (src|title|alt)="([^"]+)")? – look for a src, title, or alt attribute followed by its value and place them into capture groups
    • (?: (src|title|alt)="([^"]+)")? – again
    • (?: (src|title|alt)="([^"]+)")? – again

    https://regex101.com/r/GXyAZf/1

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search