skip to Main Content

I have a PHP script which builds a sitemap (a XML file accodring to the standard sitemap structure).

My question is about improving it. As you you, a website has new posts daily. Also post may be edited several times per hour/day/month or whenever. I have two strategy to handle that:

  1. Making a new PHP script which parse that XML file and finds the node and modify it when the post is edited and add a new node when a new post is added (it needs to count the number of all nodes before inserting a new one, since a sitemap file can has 50,000 URL utmost).

  2. Exucuting my current PHP script according to a specific period daily (i.e every night on midnight) using a Cron-Jobs. It means rebuilding it from the scratch every time (actually building a new sitemap every night)

Ok which strategy is more optimal and profitable? Which one is the standard approach?

3

Answers


  1. This depends on busy your website is.
    If you have a small website where content changes happen either on a weekly- or monthly-basis, you can simply create an XML- and HTML-sitemap by script, any time new content is available and upload it to your webspace.

    If you have a website with many pages and an almost daily update frequency, such as a blog, it is quite handy if you can automatically generate a new sitemap anytime new content is ready.

    If you are using a CMS then you have a wide range of plugins that could update it incrementally. Or you could just make your script do it.

    Login or Signup to reply.
  2. Modifying a XML file has its dangers. One reason is that you need to compare and compile actions (replace, insert, delete). This is complex and the possibility of errors is high. Another problem is that sitemaps can be large, loading them into memory for modifications might not be possible.

    I suggest you generate the XML sitemap in a cronjob. Do not overwrite the current sitemap directly but copy/link it after it is completed. This avoids having no sitemap at all if here is an error.

    If you like to manage the URLs incrementally do so in an SQL table, treat the XML sitemap as an export of this table.

    Login or Signup to reply.
  3. Im working on something similar, at first i wanted to break it into arrays and read each line to find out if the url already esists, if so modify the time, if not then create a new node but i had alot of issues comparing the lines so i came on here for answers didnt get it so i wnt back and tested and tried many things till the answer came to me so here’s how i did it for anyone else looking for answers. Call the function everytime theres a new page created through a post ect

    <?php
    function folderCreate($folderPaths){
      $folder='';
      $explodedFiles=explode('/',$folderPaths);
      foreach($explodedFiles as $value){
        $folder.=$value."/";
        if(!file_exists($folder)){mkdir($folder);}
      }
      if(file_exists($folderPaths)){
        return true;
      }
    }
    function makeSitemap($fileLocation,$fileName='sitemap.xml'){
        if(folderCreate($fileLocation)){
            $sitemapXML='<?xml version="1.0" encoding="UTF-8"?>'. PHP_EOL;
            $sitemapXML.='<urlset xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'. PHP_EOL;
            $sitemapXML.='</urlset>'. PHP_EOL;
            file_put_contents($fileLocation.$fileName,$sitemapXML);
            return file($fileLocation.$fileName);
        }
    }
    function numbering($int,$powOf=1){
        $val=str_replace(".","",$int/pow(10,$powOf));
        return $val;
    }
    function sitemapEdit2($url,$state="add",$fileLocation='',$sitemapName='sitemap.xml',$backup='backups/xml/sitemaps/'){
        $date=getdate(date('U'));
        if($fileLocation!="" && substr($fileLocation,-1)!="/")$fileLocation.="/";
        $sitemapFile = file("$fileLocation$sitemapName") or $sitemapFile=makeSitemap($fileLocation,$sitemapName);
        {//create a backup: Always create a back up DX
            if($sitemapFile && folderCreate($backup)){
                $fileArr=explode(".",$sitemapName);
                copy("$fileLocation$sitemapName",$backup.$fileArr[0]."-backup-".substr(time(),5,-2).".".$fileArr[1]);
            }
        }
        {//check if url exists
            foreach($sitemapFile as $k=>$value) {
                $value=trim($value);
                if($value=="<loc>$url</loc>"){$firstLine=$k-1; $lastLine=$k+3;}
                if($value=="</urlset>"){$urlEndLine=$k-1;} 
            }
        }
        {//create conditions
            $newFile='';
            $lastMod="{$date['year']}-".numbering($date['mon'])."-".numbering($date['mday'])."T".numbering($date['hours']).":".numbering($date['minutes']).":00+00:00";
            foreach($sitemapFile as $k=>$value) {
                $newFile_tmp=$value;
                if($state=="add"){
                    if($firstLine){//modify Lastmod
                        if($firstLine+2==$k)
                        $newFile_tmp=   "       <lastmod>$lastMod</lastmod>". PHP_EOL;
                    }else if($k==$urlEndLine){//add new url
                        $newFile_tmp.=  "   <url>". PHP_EOL;
                        $newFile_tmp.=  "       <loc>$url</loc>". PHP_EOL;
                        $newFile_tmp.=  "       <lastmod>$lastMod</lastmod>". PHP_EOL;
                        $newFile_tmp.=  "       <priority>0.80</priority>". PHP_EOL;
                        $newFile_tmp.=  "   </url>". PHP_EOL;
                    }
                }else if($state=="remove" && $firstLine){
                    if($k>=$firstLine && $k<=$lastLine){
                        $newFile_tmp="";
                    }
                }
                $newFile.=$newFile_tmp;
            }
        }
        return file_put_contents($fileLocation.$sitemapName,$newFile); 
    }
    
    {//tests
        {//test 1 :try test 1 then comment out out test 1 and try test 2
            if(sitemapEdit2("someurl3","add",'xml/test/','test_file.xml'))echo "file successfully updated <br>";
            if(sitemapEdit2("someurl4","add",'xml/test/','test_file.xml'))echo "file successfully updated <br>";
        }
        {//test 2: comment out test 1 and try test 2
            // if(sitemapEdit2("someurl3","remove",'xml/test/','test_file.xml'))echo "file successfully updated";
        }
    }
    ?>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search