I am trying to filter the results of an XML feed generated for Facebook. Currently, the feed looks like this
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title><![CDATA[Title]]></title>
<link><![CDATA[https:/path/]]></link>
<description>WooCommerce Product List RSS feed</description>
<metadata>
<ref_application_id>451257172939091</ref_application_id>
</metadata>
<item>
<g:id>anID</g:id>
<g:inventory>5</g:inventory>
<g:description><![CDATA[Some Text]]></g:description>
<g:condition>new</g:condition>
<g:mpn>sku</g:mpn>
<g:title>Product Title</g:title>
<g:availability>in stock</g:availability>
<g:price>185.00 EUR</g:price>
<g:brand><![CDATA[BRAND1]]></g:brand>
</item>
<item>
<g:id>anID</g:id>
<g:inventory>5</g:inventory>
<g:description><![CDATA[Some Text]]></g:description>
<g:condition>new</g:condition>
<g:mpn>sku</g:mpn>
<g:title>Product Title</g:title>
<g:availability>in stock</g:availability>
<g:price>185.00 EUR</g:price>
<g:brand><![CDATA[BRAND2]]></g:brand>
</item>
<item>
<g:id>anID</g:id>
<g:inventory>5</g:inventory>
<g:description><![CDATA[Some Text]]></g:description>
<g:condition>new</g:condition>
<g:mpn>sku</g:mpn>
<g:title>Product Title</g:title>
<g:availability>in stock</g:availability>
<g:price>185.00 EUR</g:price>
<g:brand><![CDATA[BRAND2]]></g:brand>
</item>
............
I need to remove some nodes based on the brand value. My code currently looks like this:
$xmlstr = get_xml_from_url('urlToXMLFeed/xml/testfeed1.xml');
$xmlobj = new SimpleXMLElement($xmlstr);
$xmlobj->registerXPathNamespace("g", "http://base.google.com/ns/1.0");
$i = 0;
foreach($xmlobj->channel->item as $item)
{
$namespaces = $item->getNameSpaces(true);
// echo $namespaces;
$gbrand = $item->children($namespaces['g']);
$finalBrand = $gbrand->brand;
if(strcmp($finalBrand,"BRAND2") == 0)
{
unset($item);
}
$i ++;
}
//Format XML to save indented tree rather than one line and save
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xmlobj->asXML());
$dom->save('fileName.xml');
The newly generated XML still has the nodes with the BRAND2. Also, tried using this to remove the node
unset($xmlobj->channel->item->{$i});
But the second approach only removes the first occurrence of BRAND2.
Also tried the following:
$domElemsToRemove = array();
.......
if(strcmp($finalBrand,"BRAND2") == 0)
{
$domElemsToRemove[] = $item;
}
}
....
foreach( $domElemsToRemove as $domElement ){
$xmlobj->channel->removeChild($domElement);
}
But still the same result
3
Answers
I found a possible solution for my case. According to a PHP comment here, we can not directly remove nodes, as it was my first approach because it will break the loop. Thus, we have to create a queue with the nodes we wish to remove. Then, based on the answer here the solution is this
That removes all the nodes from the XML object properly!
Here is a solution based on XSLT.
It removes all
<item>
elements with theg:brand='BRAND2'
.Input XML
XSLT
Output XML
You’re already pretty close. With PHP SimpleXML:
See it running online: https://3v4l.org/nWOrY
This works already because:
unset($item)
only unsets the variable. Using the SimpleXMLElement self-reference$item[0]
removes the element from the XML document withunset
.Traversal by
foreach ($xmlobj->channel->item as $item)
invalidates ifunset($item[0])
is called within the foreach-body (PHP Fatal error: Uncaught Error: SimpleXMLElement is not properly initialized). Foreach-ing over an array of SimpleXMLElements is immune.$xmlobj->xpath()
returns such an array already.Such an array could be created as well with
iterator_to_array($xmlobj->channel->item, false)
(second parameterfalse
with SimpleXML!) if you do not want to useSimpleXMLElement::xpath()
function.XPath in SimpleXML has the XPath Namespaces already registered ("
g
" for "http://base.google.com/ns/1.0
" as it is defined in the XML document element).XPath itself is powerful in querying the document for elements (or attributes).
Some references:
Compare with the answer Remove a child with a specific attribute, in SimpleXML for PHP (2008 by Stefan Gehrig) which is nearly identical to your scenario, different only in the decision on attribute – not element – content and don’t miss the "By the way: selecting specific nodes is much more simple when you use XPath" part at the end.
Then compare with my 2013 answer there that shows the SimpleXMLElement self-reference (with
unset
). It is a concept that dates back to earlier contributions on Stackoverflow, for example inRemove multiple empty nodes with SimpleXML (2012 by Beshoy Girgis) and there should be a couple more.
Perhaps Kamil Szot coined the term SimpleXMLElement self-reference back in June 2010 with How can I set text value of SimpleXmlElement without using its parent?
And some closing comment:
I write this as well because as far as SimpleXML is of concern, DOMNodeList or importing into DOM (
dom_import_simplexml()
) should not be necessary for element deletion. It’s more about using an array if you want to delete a list of elements as the traversal (perhaps similar to DOMNodeList behaviour) as just foreach-ing breaks with the first deletion.