I have an XML document that is an ITF-16 LE Encoding. Because of that, It is not readable using wp all import.
When I look in the version section, I see this
<?xml version="1.0" encoding="Unicode" ?>
And in my visual studio code I at the bottom I see.
UTF-16 LE
I already changed using Visual studio, but since it going to be a new file every time (in the same format). It would be great if PHP could transform it into UTF-8
<?xml version="1.0" encoding="Unicode" ?>
<root>
<docs>
Is it possible to change the encoding of this file using PHP?
2
Answers
Here is a generic XSLT that will copy your entire input XML as-is, but with the encoding specified in the xsl:output. What is left is just to run an XSLT transformation in PHP.
DOMDocument::loadXML()
reads the encoding attribute from the XML declaration. ButUnicode
is not a valid encoding afaik – I would expectUTF-16LE
. The DOM API in PHP uses UTF-8. So it will decode anything to UTF-8 (depending on the defined encoding) and encode it depending on the encoding of the target document. You can just change it after loading.Here is a demo:
Output:
The generated string changes with the defined encoding.
I started with an UTF-8 document here – because SO is UTF-8 itself and you can see the non-ascii characters that way.
ASCII
triggers the entity encoding for non-ascii characters.UTF-16
adds a BOM to provide the byte order. SO can not display the UTF-16 encoded chars – so you get the � symbol.UTF-16LE
andUTF-16BE
define the byte order in the encoding, no BOM is needed.Of course it works the same the other way around.