I have a PHP script which builds a sitemap (a XML file accodring to the standard sitemap structure).
My question is about improving it. As you you, a website has new posts daily. Also post may be edited several times per hour/day/month or whenever. I have two strategy to handle that:
-
Making a new PHP script which parse that XML file and finds the node and modify it when the post is edited and add a new node when a new post is added (it needs to count the number of all nodes before inserting a new one, since a sitemap file can has 50,000 URL utmost).
-
Exucuting my current PHP script according to a specific period daily (i.e every night on midnight) using a Cron-Jobs. It means rebuilding it from the scratch every time (actually building a new sitemap every night)
Ok which strategy is more optimal and profitable? Which one is the standard approach?
3
Answers
This depends on busy your website is.
If you have a small website where content changes happen either on a weekly- or monthly-basis, you can simply create an XML- and HTML-sitemap by script, any time new content is available and upload it to your webspace.
If you have a website with many pages and an almost daily update frequency, such as a blog, it is quite handy if you can automatically generate a new sitemap anytime new content is ready.
If you are using a CMS then you have a wide range of plugins that could update it incrementally. Or you could just make your script do it.
Modifying a XML file has its dangers. One reason is that you need to compare and compile actions (replace, insert, delete). This is complex and the possibility of errors is high. Another problem is that sitemaps can be large, loading them into memory for modifications might not be possible.
I suggest you generate the XML sitemap in a cronjob. Do not overwrite the current sitemap directly but copy/link it after it is completed. This avoids having no sitemap at all if here is an error.
If you like to manage the URLs incrementally do so in an SQL table, treat the XML sitemap as an export of this table.
Im working on something similar, at first i wanted to break it into arrays and read each line to find out if the url already esists, if so modify the time, if not then create a new node but i had alot of issues comparing the lines so i came on here for answers didnt get it so i wnt back and tested and tried many things till the answer came to me so here’s how i did it for anyone else looking for answers. Call the function everytime theres a new page created through a post ect