I have a huge (approx. 50GB) JSON file to deserialize. The JSON file consists of 14 arrays, and short example of it can be found here.
I wrote my POCO file, declaring 15 classes (one for each array, and a root class) and now I am trying to get my data in. Since the original data are huge and come in a zip file I am trying not to unpack the whole thing. Hence, the use of IO.Compression in the following code.
using System.IO.Compression;
using System.Text.Json;
using System.Text.Json.Nodes;
namespace read_and_parse
{
internal class Program
{
static void Main()
{
var fc = new Program();
string zip_path = @"C:ProjectsBBRDownload_Totalexample_json.zip";
using FileStream file = File.OpenRead(zip_path);
using (var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
foreach (ZipArchiveEntry entry in zip.Entries)
{
string[] name_split = entry.Name.Split('_');
string name = name_split.Last().Substring(0, name_split.Last().Length - 5);
bool canConvert = long.TryParse(name, out long number1);
if (canConvert == true)
{
Task task = fc.ParseJsonFromZippedFile(entry);
}
}
}
}
private async Task ParseJsonFromZippedFile(ZipArchiveEntry entry)
{
JsonSerializerOptions options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase };
await using Stream entryStream = entry.Open();
IAsyncEnumerable<JsonNode?> enumerable = JsonSerializer.DeserializeAsyncEnumerable<JsonNode>(entryStream, options);
await foreach (JsonNode? obj in enumerable)
{
// Parse only subset of the object
JsonNode? bbrSagNode = obj?["BBRSaglist"];
if (bbrSagNode is null) continue;
else
{
var bbrSag = bbrSagNode.Deserialize<BBRSagList>();
}
}
}
}
}
Unfortunately I do not get anything out of it and it fails in the foreach-loop of the task. It fails with a System.Threading.Tasks.VoidTaskResult.
How do I get the data deserialized?
3
Answers
due to my company security i cant access to data example. please check if you have no root element. just a JsonArray/list …
i made an example. try to use ToListAsync extension
and you can deserialize each object and add to main list.. etc..
The Main method should be marked as
async
, and you need toawait
the task returned byParseJsonFromZippedFile
:Your root JSON container is not an array, it’s an object:
You will not be able to use
JsonSerializer.DeserializeAsyncEnumerable<T>
to deserialize such JSON because this method only supports async streaming deserialization of JSON arrays, not objects. And unfortunately System.Text.Json does not support streaming deserialization of objects, or even streaming in general, it supports pipelining. If you need to stream through a file using System.Text.Json you will need to build on this answer by mtosh to Parsing a JSON file with .NET core 3.0/System.text.Json.As an alternative, you could use Json.NET which is designed for streaming via
JsonTextReader
. Your JSON object consists of multiple array-valued properties, and using Json.NET you will be able to stream through yourentryStream
asynchronously, load each array value into aJToken
, then call some callback for each token.First, introduce the following extension methods:
And now you will be able to do the following, to process the entries in the "BBRSagList" array:
Notes:
As observed by Fildor-standswithMods in comments, you must also declare your
Main()
method aspublic static async Task Main()
and also awaitParseJsonFromZippedFile(entry)
(I made
ParseJsonFromZippedFile()
a static method so there is no reason to allocate aProgram
instance.)Mockup fiddle here.