skip to Main Content

I have a JSON object which looks like

{
  "results": [
    {
      "id": "abc456",
      "groups": [
        {
          "parent_group": null,
          "type": "D"
        },
        {
          "parent_group": null,
          "type": "DEPARTMENT"
        }
      ],
      "examples": [
        {
          "id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
          "date": "2020-05-10T00:00:00Z"
        },
        {
          "id": "bd31d475-6137-4409-8d17-535f1bf94071",
          "date": "2021-05-11T00:00:00Z"
        },
        {
          "id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
          "date": "2019-05-11T00:00:00Z"
        }
      ]
    },
    {
      "id": "def456",
      "groups": [
        {
          "parent_group": null,
          "type": "D"
        },
        {
          "parent_group": null,
          "type": "D"
        }
      ],
      "examples": [
        {
          "id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
          "date": "2020-05-10T00:00:00Z"
        },
        {
          "id": "bd31d475-6137-4409-8d17-535f1bf94071",
          "date": "2021-05-11T00:00:00Z"
        },
        {
          "id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
          "date": "2019-05-11T00:00:00Z"
        }
      ]
    }
  ]
}

I have to sort the items in the example arrays in the result object and return them in JSON line format.

The solution I have right now iterates every array in the results object and sorts the example array by date, and replaces

var jsonlBuilder = new StringBuilder();
var serializer = JsonSerializer.CreateDefault(new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc });

using (var textWriter = new StringWriter(jsonlBuilder))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        var employments = obj.SelectToken("examples");
        if (employments.Count() > 1)
        {
            var b = employments.ToObject<JArray>().OrderBy(c => c.SelectToken("date").ToObject<DateTime>(serializer));
            var newEmploymentArray = new JArray(b);
            obj["examples"].Replace(newEmploymentArray);
        }
        obj.WriteTo(jsonWriter);
        jsonWriter.WriteWhitespace("n");
    }
}

This is not performing well. It takes about 6ms without the code in the if (employments.Count() > 1) block and 30ms with the if block. Is there a better way to do this?

2

Answers


  1. this code works for me

        string[] lines = File.ReadLines(@"C:...").ToArray();
    
        //or if you have already text from another source
        string[] lines = text.Split("rn");
    
        var arrStart = false;
        List<int> indexes = new();
        List<KeyValuePair<DateTime, string[]>> dates = new();
        for (int i = 0; i < lines.Length; i++)
        {
            if (lines[i].Contains("examples"))
            {
                arrStart = true;
                continue;
            }
            if (arrStart && lines[i].Contains("date"))
            {
                DateTime dateTime = (DateTime)JObject.Parse("{" + ((string)lines[i])
                                                     .Trim()
                                                     .Replace("""", """) + "}")["date"];
                                                     
                //Or if you don't want to use any serializer
                //var l = ((string)lines[i]).Replace(""", "").Trim();
                //var s = l.Substring(l.IndexOf(":")+1).Replace("""", """);
                //var dateTime1 = Convert.ToDateTime(s);
            
                dates.Add(new KeyValuePair<DateTime, string[]>(dateTime, new string[] { lines[i - 1], lines[i] }));
                indexes.Add(i);
            }
            else if (arrStart && lines[i].Contains("]"))
            {
                arrStart = false;
                dates = dates.OrderBy(x => x.Key).ToList();
    
                var j = 0;
                foreach (var index in indexes)
                {
                    lines[index - 1] = dates[j].Value[0];
                    lines[index] = dates[j].Value[1];
                    j++;
                }
                dates.Clear();
                indexes.Clear();
            }
        }
        var text = string.Join("rn", lines);
    
    Login or Signup to reply.
  2. With the JSON shown, your code isn’t performing nearly as badly as you state. I am seeing:

    • 0.0629 ms average runtime/rep for 10000 reps of your current code.
    • 0.0246 ms average runtime/rep for 10000 reps with sorting removed.

    Demo fiddle #1 here.

    That being said, there are a few improvements you can make:

    1. Move all work outside the OrderBy() since, in general, the comparison will be called n*log(n) times.

    2. Use the JToken indexer rather than SelectToken(). The indexer only requires a dictionary lookup, while SelectToken() will first parse the incoming string into JSONPath components, then evaluate each component against the current token(s), which eventually does the same dictionary lookup.

    3. Rather than invoking the serializer for every "date" value, invoke it only once by deserializing your JToken hierarchy with DateTimeZoneHandling.Utc + DateParseHandling.DateTime. If you do that DateTime values will be recognized during reading and the serializer will not be needed subsequently.

    4. Avoid cloning of JTokens. When you do employments.ToObject<JArray>() you effectively clone the contents of the array. You also clone a JToken whenever you add it to a parent token without removing it from its current parent. (See this answer for an explanation why.)

    5. When reading from a file or stream, be sure to deserialize directly from the stream rather than loading into an intermediate string, as explained in Performance Tips: Optimize Memory Usage.

      Also consider writing directly to a file rather than to an intermediate StringBuilder.

    6. If your JSON has a fixed schema, you might consider designing a corresponding data model and deserializing to it. According to 11 Ways to Improve JSON Performance & Usage by mwatson, parsing to a JToken hierarchy can be up to ~20% slower than deserializing to some data model.

    Putting together #1 – #5, your code could be rewritten as follows:

    // Deserialize with DateTimeZoneHandling.Utc.
    // This recogizes all DateTime values automatically and populates then in the JToken hierarchy, thereby avoiding the need to deserialize each one individually
    var settings = new JsonSerializerSettings { 
        DateTimeZoneHandling = DateTimeZoneHandling.Utc, 
        DateParseHandling = DateParseHandling.DateTime 
    };
    
    // Deserialize directly from stream (if reading from file) rather than loading into a string.
    using var textReader = new StringReader(json); // If reading from a file, use a StreamReader and read directly.
    using var jsonReader = new JsonTextReader(textReader);
        var root = JsonSerializer.CreateDefault(settings).Deserialize<JToken>(jsonReader);
    var jsonArray = (JArray)root["results"];
    
    var jsonlBuilder = new StringBuilder();
    using (var textWriter = new StringWriter(jsonlBuilder))  // If writing to a file, use a StreamWriter and write directly.
    using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
    {
        foreach (var obj in jsonArray)
        {
            var employments = obj is JObject ? obj["examples"] as JArray : null; // Use indexer instead of SelectToken()
            if (employments.Count > 1) // Use the Count property rather than the LINQ Count() extension method, https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1829
            {
                var sortedList = employments
                    .Select(e => (e, date : e["date"].Value<DateTime>())) // Use the indexer and cache the DateTime value
                    .OrderBy(p => p.date).Select(p => p.e)                // And sort by the cached value
                    .ToList();
                employments.Clear();  // Prevent cloning of JTokens by clearing the array before re-adding the items.
                foreach (var item in sortedList)
                    employments.Add(item); // Add the existing items rather than creating new items.
            }
            obj.WriteTo(jsonWriter);
            jsonWriter.WriteWhitespace("n");
        }
    }   
    

    Alternatively, with approach #6, your data model would look like:

    public class Example
    {
        public string id { get; set; }
        public DateTime date { get; set; }
    }
    
    public class Group
    {
        public object parent_group { get; set; }
        public string type { get; set; }
    }
    
    public class Result
    {
        public string id { get; set; }
        public List<Group> groups { get; set; } = new ();
        public List<Example> examples { get; set; } = new ();
    }
    
    public class Root
    {
        public List<Result> results { get; set; } = new ();
    }       
    

    And your code:

    var settings = new JsonSerializerSettings { 
        DateTimeZoneHandling = DateTimeZoneHandling.Utc, 
    };
    var serializer = JsonSerializer.CreateDefault(settings);
    
    // Deserialize directly from stream (if reading from file) rather than loading into a string.
    using var textReader = new StringReader(json); // If reading from a file, use a StreamReader and read directly.
    using var jsonReader = new JsonTextReader(textReader); 
        var root = serializer.Deserialize<Root>(jsonReader);
    
    var jsonArray = root.results;
    
    var jsonlBuilder = new StringBuilder();
    using (var textWriter = new StringWriter(jsonlBuilder)) // If writing to a file, use a StreamWriter and write directly.
    using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
    {
        foreach (var obj in jsonArray)
        {
            if (obj?.examples != null && obj.examples.Count > 0)
                obj.examples.Sort((x, y) => x.date.CompareTo(y.date));
            serializer.Serialize(jsonWriter, obj);
            jsonWriter.WriteWhitespace("n");
        }
    }   
    

    With this changes made, I am seeing the following average runtimes for both methods:

    • Average time per repetition for 10000 reps of original code: 0.0590 ms.
    • Average time per repetition for 10000 reps of modified code: 0.0293 ms (50.41% faster).
    • Average time per repetition for 10000 reps of deserialized data model: 0.0228 ms (61.33% faster).

    Demo fiddle #2 here.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search