skip to Main Content

When I try to write serialized object to a file, I end up with JSON file without any diacretic characters (Polish in this case). Did I miss something obvious here, or I omitted a lot of things to set things up in proper way?

I’m using System.Text.Json

FileStream stream = new FileStream(filePath, FileMode.Create);
           
var stringoo = System.Text.Json.JsonSerializer.Serialize(paginatedData);

using (StreamWriter writetext = new StreamWriter(stream)
{
    writetext.Write(stringoo);
}

The object paginatedData is a result table from Db query, organized and displayed in custom view model. It is being displayed properly in VS, in .html views and it’s successfully exported to .xlsx file, all without any encoding issues.

The object stringoo is properly displayed in VS in JSON visualizer (when hitting breakpoint).

I tried: using (StreamWriter writetext = new StreamWriter(stream, Encoding.Unicode) without success.

I will appreciate any help or direction for further research.

2

Answers


  1. Specify the encoding explicitly when creating your StreamWriter:
    Use Encoding.UTF8 instead of Encoding.Unicode

    using (StreamWriter writetext = new StreamWriter(stream, Encoding.UTF8))
    {
        writetext.Write(stringoo);
    }
    
    Login or Signup to reply.
  2. By default, JSON serializer only retain printable characters in ASCII, other characters will be escaped as uXXXX, you need to specify a non default encoder to retain particular characters.

    using System.Text.Encodings.Web;
    using System.Text.Unicode;
    
    var encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin,
                                           UnicodeRanges.LatinExtendedA);
    var option = new JsonSerializerOptions { Encoder = encoder };
    var stringoo = JsonSerializer.Serialize(paginatedData, option);
    

    Also note that there are two types of diacritic characters in Unicode, one is a single character and another is an ASCII letter combined with a diacritical mark. For instance ę (u0119) and (eu0328) look the same, the former belongs to Latin Extended-A, the later belongs to Combining Diacritical Marks, the above code only prevents the former from being escaped.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search