skip to Main Content

Hello I have my code that copy the html from external url and echo it on my page.
Some of the HTMLs have links and/or picure SRC inside.
I will need some help to truncate them (from absolute url to relative url inside $data )

For example : inside html there is href

<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >

or SRC

<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">

I would like to keep only subdirectory.

/products/score-vs-ibd/z

/Filters/MinDUp1.gif

Maybe with preg_replace , but im not familiar with Regular expressions.

This is my original code that works very well, but now im stuck truncating the links.

<?php
$post_tags = get_the_tags();
if ( $post_tags ) {
$tag = $post_tags[0]->name; 
}   
$html= file_get_contents('https://www.trade-ideas.com/ticky/ticky.html?symbol='. "$tag");

$start = strpos($html,'<div class="span3 height-325"');
$end =  strpos($html,'<!-- /span -->',$start);
$data= substr($html,$start,$end-$start);
echo $data ;
?>

2

Answers


  1. Here is the code:

    function getUrlPath($url) {
       $re = '/(?:https?://)?(?:[^?/s]+[?/])(.*)/';
       preg_match($re, $url, $matches);
       return $matches[1];
    }
    

    Example: getUrlPaths("http://myassets.com:80/files/images/image.gif") returns files/images/image.gif

    Login or Signup to reply.
  2. You can locate all the URLs in the html string with a regex using preg_match_all().
    The regex:

    '/=['"](https?://.*?(/.*))['"]/i'
    

    will capture both the entire URL and the path/query string for every occurrence of ="http://domain/path" or ='https://domain/path?query' (http/https, single or double quotes, with/without query string).
    Then you can just use str_replace() to update the html string.

    <?php
    $html = '<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >
    <img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">
    <img src='https://static.trade-ideas.com/Filters/MinDUp1.gif?param=value'>';
    
    $pattern = '/=['"](https?://.*?(/.*))['"]/i';
    $urls = [];
    preg_match_all($pattern, $html, $urls);
    //var_dump($urls);
    foreach($urls[1] as $i => $uri){
        $html = str_replace($uri, $urls[2][$i], $html);
    }
    echo $html;
    

    Run it live here.

    Note, this will change all absolute URLs enclosed in quotes immediately following an =.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search