I have a JSON object which looks like
{
"results": [
{
"id": "abc456",
"groups": [
{
"parent_group": null,
"type": "D"
},
{
"parent_group": null,
"type": "DEPARTMENT"
}
],
"examples": [
{
"id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
"date": "2020-05-10T00:00:00Z"
},
{
"id": "bd31d475-6137-4409-8d17-535f1bf94071",
"date": "2021-05-11T00:00:00Z"
},
{
"id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
"date": "2019-05-11T00:00:00Z"
}
]
},
{
"id": "def456",
"groups": [
{
"parent_group": null,
"type": "D"
},
{
"parent_group": null,
"type": "D"
}
],
"examples": [
{
"id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
"date": "2020-05-10T00:00:00Z"
},
{
"id": "bd31d475-6137-4409-8d17-535f1bf94071",
"date": "2021-05-11T00:00:00Z"
},
{
"id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
"date": "2019-05-11T00:00:00Z"
}
]
}
]
}
I have to sort the items in the example arrays in the result object and return them in JSON line format.
The solution I have right now iterates every array in the results object and sorts the example array by date, and replaces
var jsonlBuilder = new StringBuilder();
var serializer = JsonSerializer.CreateDefault(new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc });
using (var textWriter = new StringWriter(jsonlBuilder))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
foreach (var obj in jsonArray)
{
var employments = obj.SelectToken("examples");
if (employments.Count() > 1)
{
var b = employments.ToObject<JArray>().OrderBy(c => c.SelectToken("date").ToObject<DateTime>(serializer));
var newEmploymentArray = new JArray(b);
obj["examples"].Replace(newEmploymentArray);
}
obj.WriteTo(jsonWriter);
jsonWriter.WriteWhitespace("n");
}
}
This is not performing well. It takes about 6ms without the code in the if (employments.Count() > 1)
block and 30ms with the if block. Is there a better way to do this?
2
Answers
this code works for me
With the JSON shown, your code isn’t performing nearly as badly as you state. I am seeing:
Demo fiddle #1 here.
That being said, there are a few improvements you can make:
Move all work outside the
OrderBy()
since, in general, the comparison will be calledn*log(n)
times.Use the
JToken
indexer rather thanSelectToken()
. The indexer only requires a dictionary lookup, whileSelectToken()
will first parse the incoming string into JSONPath components, then evaluate each component against the current token(s), which eventually does the same dictionary lookup.Rather than invoking the serializer for every
"date"
value, invoke it only once by deserializing yourJToken
hierarchy withDateTimeZoneHandling.Utc
+DateParseHandling.DateTime
. If you do thatDateTime
values will be recognized during reading and the serializer will not be needed subsequently.Avoid cloning of JTokens. When you do
employments.ToObject<JArray>()
you effectively clone the contents of the array. You also clone a JToken whenever you add it to a parent token without removing it from its current parent. (See this answer for an explanation why.)When reading from a file or stream, be sure to deserialize directly from the stream rather than loading into an intermediate string, as explained in Performance Tips: Optimize Memory Usage.
Also consider writing directly to a file rather than to an intermediate
StringBuilder
.If your JSON has a fixed schema, you might consider designing a corresponding data model and deserializing to it. According to 11 Ways to Improve JSON Performance & Usage by mwatson, parsing to a
JToken
hierarchy can be up to ~20% slower than deserializing to some data model.Putting together #1 – #5, your code could be rewritten as follows:
Alternatively, with approach #6, your data model would look like:
And your code:
With this changes made, I am seeing the following average runtimes for both methods:
Demo fiddle #2 here.