I have a script that updates a configuration file with current year, but for some reason the copyright symbol is not being inserted correctly. The PowerShell script is UTF-8 with BOM and the JSON file is UTF-8.
The workflow is that I read from a JSON file, update the copyright date, and then save to a JSON file again.
The JSON file info.json
:
{
"CopyrightInfo": "Copyright © CompanyName 1992"
}
Reproducible excerpt of the PowerShell script:
$path = "./info.json"
$a = Get-Content $path| ConvertFrom-Json
$a.'CopyrightInfo' = "Copyright $([char]::ConvertFromUtf32(0x000000A9)) CompanyName $((Get-Date).Year)"
$a | ConvertTo-Json | set-content $path
I’ve tried a bunch of ways, above is the latest attempt. It looks fine when printed in PowerShell or opened in Notepad, but any other editor (Visual Studio Code, SourceTree, Azure DevOps file viewer, etc) they always result in the following:
"CopyrightInfo": "Copyright � CompanyName 2022"
If anyone can explain what I’m doing wrong that would great and even greater if they could also add a way to make it work properly.
I’m using PowerShell version 5.1.19041.1682
EDIT: Updated issue with reproducible code excerpts and used PowerShell version.
2
Answers
Can’t reproduce the issue:
To show the result in PowerShell with any external program see: Displaying Unicode in Powershell
Given that you’re running Windows PowerShell and that you want to both read the input and create the output as UTF-8-encoded:
If it’s acceptable to create a UTF-8 file with BOM (which is what
Set-Content -Encoding utf8
in Windows PowerShell invariably creates):Creating a UTF-8 file without BOM requires more work in Windows PowerShell (whereas this encoding is now the consistent default in PowerShell (Core) 7+), taking advantage of the – curious – fact that
New-Item
, when given a-Value
argument, (invariably) creates files with that encoding:Note:
On reading: PowerShell recognizes Unicode BOMs automatically, but what encoding is assumed in the absence of a BOM depends on the PowerShell edition, both when reading source code and when reading files via cmdlets, such as via
Get-Content
:Windows PowerShell assumes the system’s legacy ANSI code page (aka language for non-Unicode programs).
PowerShell (Core) assumes UTF-8.
On writing: Once a file is read, PowerShell does not preserve information about an input file’s original character encoding – the file content is stored in .NET strings (which are composed of in-memory UTF-16LE code units), even when the data is simply passed through the pipeline. As such, it is a file-writing cmdlet’s own default encoding that is used if no
-Encoding
argument is specified, irrespective of where the data came from; specifically:Windows PowerShell’s
Set-Content
defaults to the system legacy ANSI encoding; unfortunately, other cmdlets have different defaults; notably,Out-File
and its virtual alias,>
, default to UTF-16LE ("Unicode") – see the bottom section of this answer for details.PowerShell (Core) now fortunately defaults to BOM-less UTF-8, across all cmdlets.