I have been given an unusual goal and that is to create an XML with no CDATA element from a multidimensional array with these constraints:
- CDATA shouldn’t be used
- <, >, ‘, ", & – These chars should be encoded.
But when I’m setting the nodeValue after converting ‘ and " to '
and "
, the value is not being retained in the XML node and are being unescaped.
private function convertElement(DOMElement $element, $value)
{...
$value = htmlspecialchars($value ?? '', ENT_XML1 | ENT_QUOTES);
$element->nodeValue = $value;
Node element is not retaining the encoded char value, and &apos and "es are decoded to ‘ and "
dd($element->nodeValue);
// <p itemprop='description'>'Test single quotes'</p>itemprop="description">
// Should have retained
// <p itemprop="description">'Test single quotes'</p>
And if i use appendChild(), it is causing the values to double escape.
//value = Test Ampersand &
$textNode = $this->document->createTextNode($value);
$element->appendChild($textNode);
//Test Amersand &amp;
My issue seems to be related to this phenomenon DOMElement nodeValue inconsistant get vs set
I’ll appreciate any workaround or suggestions.
Thanks!
2
Answers
Please try this
CDATASection nodes handle encodings differently, so it is understandable to avoid them. They are mostly for BC and human readability.
Quotes are only needed to be escaped inside an attribute using them. The DOM serializer avoids unnecessary escaping. So you only option would be the write your own serializer.
The
nodeValue
property implementation in PHP does not match the DOM standard and is imho broken. It will only partially escape input.The original DOM standard required you to create and add a text node. Current DOM (and PHP) has the
textContent
property.Here is an example:
Output: