currently I’d like to parse a string from a xml-file (RSS) to retrieve and display the image link in node: "<enc:enclosure rdf:resource="https://www.science.org/… .jpg".
To me it looks like something with two different namespaces in it. And so far I found no similar question or example to get this working.
In attached simplified code example you can see what is working as expected and that the link in node : "<enc:enclosure rdf:resource="https://www.science.org/… .jpg" is not displayed that way.
<?php
$xml_string = '<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:prism="http://prismstandard.org/namespaces/basic/2.0/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:enc="http://purl.oclc.org/net/rss_2.0/enc/" xmlns:cc="http://web.resource.org/cc/" xmlns="http://purl.org/rss/1.0/">
<item>
<title><![CDATA[Canada moves to ban funding for ‘risky’ foreign collaborations]]></title>
<link>https://www.science.org/content/article/canada-moves-ban-funding-risky-foreign-collaborations</link>
<description><![CDATA[China is seen as main target in rejecting joint projects with certain foreign entities]]></description>
<enc:enclosure rdf:resource="https://www.science.org/do/10.1126/science.adh2317/rss/_20230217_nid_canada_china.jpg" enc:length="165061" enc:type="image/jpeg" />
<dc:title><![CDATA[Canada moves to ban funding for ‘risky’ foreign collaborations]]></dc:title>
<dc:identifier>doi:10.1126/science.adh2317</dc:identifier>
<dc:date>2023-02-17T05:55:00Z</dc:date>
<dc:creator>Jeffrey Mervis</dc:creator>
<prism:publicationName><![CDATA[Canada moves to ban funding for ‘risky’ foreign collaborations]]></prism:publicationName>
<prism:coverDate>2023-02-17T05:55:00Z</prism:coverDate>
<prism:coverDisplayDate>2023-02-17T05:55:00Z</prism:coverDisplayDate>
<prism:doi>10.1126/science.adh2317</prism:doi>
<prism:url>https://www.science.org/content/article/canada-moves-ban-funding-risky-foreign-collaborations</prism:url>
</item></rdf:RDF>';
$xml = simplexml_load_string($xml_string);
foreach ($xml->item as $item) {
if($item->children('http://purl.oclc.org/net/rss_2.0/enc/')) {
foreach ($item->children('http://purl.oclc.org/net/rss_2.0/enc/') as $eintrag1) {
echo'<pre>';print_r($eintrag1);echo'</pre>'; // is working
echo 'Length: ' . $eintrag1['length'] . '<br />'; // is working
$eintrag2 = $eintrag1->children('http://www.w3.org/1999/02/22-rdf-syntax-ns#');
echo'<pre>';print_r($eintrag2);echo'</pre>'; // is working
echo 'Resource: ' . $eintrag2['resource'] . '<br />'; // NOT working!!! Only empty output, but it's the link I would like to extract!
} }
}
?>
It looks simple and I thought I already managed those problems with my few PHP skills but none of my approaches (f.e. DOM, SimpleXML, xpath) brought me to the desired result.
If someone finds time to help me finding the answer I would be very appreciated. Thanks in advance.
2
Answers
SimpleXML is known to have issues with namespaces, try DOM + DOMXPath instead
SimpleXML does some implicit namespace switching. You used the explicit syntax for the children already, you can do the same for the attributes.
However I suggest defining a constant/variable with all the namespaces you are using. This will make you code a lot more readable. The keys can be the different from the prefixes in the document.
Output:
DOM
DOM is more explicit and has a set of namespace aware methods with the suffix
NS
(for examplegetAttributeNS()
).DOMXpath::evaluate()
allows for complex expressions to fetch nodes and scalar values:Output: