skip to Main Content

I am trying to filter the results of an XML feed generated for Facebook. Currently, the feed looks like this

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
  <channel>
    <title><![CDATA[Title]]></title>
    <link><![CDATA[https:/path/]]></link>
    <description>WooCommerce Product List RSS feed</description>
    <metadata>
      <ref_application_id>451257172939091</ref_application_id>
    </metadata>
    <item>
      <g:id>anID</g:id>
      <g:inventory>5</g:inventory>
      <g:description><![CDATA[Some Text]]></g:description>
      <g:condition>new</g:condition>
      <g:mpn>sku</g:mpn>
      <g:title>Product Title</g:title>
      <g:availability>in stock</g:availability>
      <g:price>185.00 EUR</g:price>
      <g:brand><![CDATA[BRAND1]]></g:brand>
    </item>
    <item>
      <g:id>anID</g:id>
      <g:inventory>5</g:inventory>
      <g:description><![CDATA[Some Text]]></g:description>
      <g:condition>new</g:condition>
      <g:mpn>sku</g:mpn>
      <g:title>Product Title</g:title>
      <g:availability>in stock</g:availability>
      <g:price>185.00 EUR</g:price>
      <g:brand><![CDATA[BRAND2]]></g:brand>
    </item>
    <item>
      <g:id>anID</g:id>
      <g:inventory>5</g:inventory>
      <g:description><![CDATA[Some Text]]></g:description>
      <g:condition>new</g:condition>
      <g:mpn>sku</g:mpn>
      <g:title>Product Title</g:title>
      <g:availability>in stock</g:availability>
      <g:price>185.00 EUR</g:price>
      <g:brand><![CDATA[BRAND2]]></g:brand>
    </item>
............

I need to remove some nodes based on the brand value. My code currently looks like this:

$xmlstr = get_xml_from_url('urlToXMLFeed/xml/testfeed1.xml');
$xmlobj = new SimpleXMLElement($xmlstr);
$xmlobj->registerXPathNamespace("g", "http://base.google.com/ns/1.0");
$i = 0;

foreach($xmlobj->channel->item as $item)
{
    $namespaces = $item->getNameSpaces(true);
    // echo $namespaces;
    $gbrand = $item->children($namespaces['g']);
    $finalBrand = $gbrand->brand;
    if(strcmp($finalBrand,"BRAND2") == 0)
    {
        
        unset($item);
    }
    $i ++;
}

//Format XML to save indented tree rather than one line and save
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xmlobj->asXML());
$dom->save('fileName.xml'); 

The newly generated XML still has the nodes with the BRAND2. Also, tried using this to remove the node

unset($xmlobj->channel->item->{$i});

But the second approach only removes the first occurrence of BRAND2.

Also tried the following:

$domElemsToRemove = array();
.......
    if(strcmp($finalBrand,"BRAND2") == 0)
    {
        $domElemsToRemove[] = $item;
    }
}
....
foreach( $domElemsToRemove as $domElement ){
    $xmlobj->channel->removeChild($domElement);
}

But still the same result

3

Answers


  1. Chosen as BEST ANSWER

    I found a possible solution for my case. According to a PHP comment here, we can not directly remove nodes, as it was my first approach because it will break the loop. Thus, we have to create a queue with the nodes we wish to remove. Then, based on the answer here the solution is this

    foreach( $domElemsToRemove as $domElement )
    {
        $dom = dom_import_simplexml($domElement);
        $dom->parentNode->removeChild($dom);
    }
    

    That removes all the nodes from the XML object properly!


  2. Here is a solution based on XSLT.

    It removes all <item> elements with the g:brand='BRAND2'.

    Input XML

    <?xml version="1.0" encoding="UTF-8"?>
    <rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
        <channel>
            <title><![CDATA[Title]]></title>
            <link><![CDATA[https:/path/]]></link>
            <description>WooCommerce Product List RSS feed</description>
            <metadata>
                <ref_application_id>451257172939091</ref_application_id>
            </metadata>
            <item>
                <g:id>anID</g:id>
                <g:inventory>5</g:inventory>
                <g:description><![CDATA[Some Text]]></g:description>
                <g:condition>new</g:condition>
                <g:mpn>sku</g:mpn>
                <g:title>Product Title</g:title>
                <g:availability>in stock</g:availability>
                <g:price>185.00 EUR</g:price>
                <g:brand><![CDATA[BRAND1]]></g:brand>
            </item>
            <item>
                <g:id>anID</g:id>
                <g:inventory>5</g:inventory>
                <g:description><![CDATA[Some Text]]></g:description>
                <g:condition>new</g:condition>
                <g:mpn>sku</g:mpn>
                <g:title>Product Title</g:title>
                <g:availability>in stock</g:availability>
                <g:price>185.00 EUR</g:price>
                <g:brand><![CDATA[BRAND2]]></g:brand>
            </item>
            <item>
                <g:id>anID</g:id>
                <g:inventory>5</g:inventory>
                <g:description><![CDATA[Some Text]]></g:description>
                <g:condition>new</g:condition>
                <g:mpn>sku</g:mpn>
                <g:title>Product Title</g:title>
                <g:availability>in stock</g:availability>
                <g:price>185.00 EUR</g:price>
                <g:brand><![CDATA[BRAND2]]></g:brand>
            </item>
        </channel>
    </rss>
    

    XSLT

    <?xml version='1.0'?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:g="http://base.google.com/ns/1.0">
        <xsl:output method="xml" indent="yes" cdata-section-elements="title link g:description g:brand" encoding="utf-8" omit-xml-declaration="no"/>
        <xsl:strip-space elements="*"/>
    
        <xsl:template match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="item[g:brand='BRAND2']"/>
    </xsl:stylesheet>
    

    Output XML

    <?xml version='1.0' encoding='utf-8' ?>
    <rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
      <channel>
        <title><![CDATA[Title]]></title>
        <link><![CDATA[https:/path/]]></link>
        <description>WooCommerce Product List RSS feed</description>
        <metadata>
          <ref_application_id>451257172939091</ref_application_id>
        </metadata>
        <item>
          <g:id>anID</g:id>
          <g:inventory>5</g:inventory>
          <g:description><![CDATA[Some Text]]></g:description>
          <g:condition>new</g:condition>
          <g:mpn>sku</g:mpn>
          <g:title>Product Title</g:title>
          <g:availability>in stock</g:availability>
          <g:price>185.00 EUR</g:price>
          <g:brand><![CDATA[BRAND1]]></g:brand>
        </item>
      </channel>
    </rss>
    
    Login or Signup to reply.
  3. You’re already pretty close. With PHP SimpleXML:

    $xmlobj = new SimpleXMLElement($xmlstr);
    
    foreach ($xmlobj->xpath('//item[g:brand="BRAND2"]') as $item) {
        unset($item[0]);
    }
    
    # output ...
    

    See it running online: https://3v4l.org/nWOrY

    This works already because:

    1. unset($item) only unsets the variable. Using the SimpleXMLElement self-reference $item[0] removes the element from the XML document with unset.

    2. Traversal by foreach ($xmlobj->channel->item as $item) invalidates if unset($item[0]) is called within the foreach-body (PHP Fatal error: Uncaught Error: SimpleXMLElement is not properly initialized). Foreach-ing over an array of SimpleXMLElements is immune. $xmlobj->xpath() returns such an array already.

      Such an array could be created as well with iterator_to_array($xmlobj->channel->item, false) (second parameter false with SimpleXML!) if you do not want to use SimpleXMLElement::xpath() function.

    3. XPath in SimpleXML has the XPath Namespaces already registered ("g" for "http://base.google.com/ns/1.0" as it is defined in the XML document element).

    4. XPath itself is powerful in querying the document for elements (or attributes).

    Some references:

    Compare with the answer Remove a child with a specific attribute, in SimpleXML for PHP (2008 by Stefan Gehrig) which is nearly identical to your scenario, different only in the decision on attribute – not element – content and don’t miss the "By the way: selecting specific nodes is much more simple when you use XPath" part at the end.

    Then compare with my 2013 answer there that shows the SimpleXMLElement self-reference (with unset). It is a concept that dates back to earlier contributions on Stackoverflow, for example in
    Remove multiple empty nodes with SimpleXML (2012 by Beshoy Girgis) and there should be a couple more.

    Perhaps Kamil Szot coined the term SimpleXMLElement self-reference back in June 2010 with How can I set text value of SimpleXmlElement without using its parent?

    And some closing comment:

    I write this as well because as far as SimpleXML is of concern, DOMNodeList or importing into DOM (dom_import_simplexml()) should not be necessary for element deletion. It’s more about using an array if you want to delete a list of elements as the traversal (perhaps similar to DOMNodeList behaviour) as just foreach-ing breaks with the first deletion.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search