skip to Main Content

I have some pretty messy JSON data that (using brute force and jmespath to query the data) I have managed to get into a list of lists. I need to take the last element of each nested list and split it into two elements, so I can write them to a CSV file for a report. The number of nested lists in this list can is dynamic, can be as few as a few hundred to several thousand or more.

Sample:

result_list is my imported JSON data in list format, for instance, three lists included in the list below, but again, can be many hundreds of unique values:

[['name_123', '00:00:A1', 8.23, '0.15', '55541-00:AA:11:BB:22:CC:33:DD'], 
 ['site3_name-DB', '00:01:B2', 124.03, '46.72', '86753-00:AA:22:CD:F8:63:D2:B3'], 
 ['LOG-SITE2_DB', '00:00:B3', 32.09, '20.34', '22234-00:AA:11:BB:CC:33:DD:44']]

I specifically need to split the last element of each nested list (result_list[n][4] would be the index, it’s always the 4th index of the nested list, the 12345-00:11:22:33:44:55:66:77 on the -, and ONLY on the - in that one field, since other elements may sometimes contain - and I don’t want to split those, leaving me with two new elements like 12345 and 00:11:22:33:44:55:66:77 still appropriately listed with the matching data. I’d then write that new list:

[['name_123', '00:00:A1', 8.23, '0.15', '55541', '00:AA:11:BB:22:CC:33:DD'], 
 ['site3_name-DB', '00:01:B2', 124.03, '46.72', '86753', '00:AA:22:CD:F8:63:D2:B3'], 
 ['LOG-SITE2_DB', '00:00:B3', 32.09, '20.34', '22234', '00:AA:11:BB:CC:33:DD:44']]

To a CSV file, after adding some more presentable header rows. If a list has a blank/nonetype/etc. for that field I’m splitting, I’d like to ignore it so it still ends up in the final CSV report as a blank, so I’d need to skip it.

I’d prefer to do this without resorting to any non-standard packages or modules not included in Python 3.10, but just did manage to get Pandas 2.0.2 imported properly if that’s the only way to do this. I know this is a very messy way to handle this JSON data but there’s nothing I can do about that, I’ve gotta work with what I have.

Thanks in advance for the patience. In the past I had the luxury of structured databases when processing this kind of stuff and I don’t in this case, and I’m pretty green to Python.

When attempting to do this with a for/if/else loop, I get list index out of range exceptions that I’m not sure how to handle, and I’m not sure how to properly append the two new split elements to the existing, or to a new, list, like this:

for i in result_list:
    if i[4] is None:
        print(" ")
    else:
        result_list.append(i[4].split('-'))

3

Answers


  1. You probably don’t really need to care that much about it being a list, so you can just use a simple listcomp; using s as the name for the original list, it would looks something like:

    writer = csv.writer(…)
    writer.writerow([x[4].partition('-')[0::2] for x in s])
    
    Login or Signup to reply.
    • Take all but the last element x[:-1]
    • Split the last element x[-1].split('-', 1)
    • Expand these two lists into a single list with * for each.
    result_list = [[*x[:-1], *x[-1].split('-', 1)] for x in result_list]
    
    Login or Signup to reply.
  2. leave the row in place if there is no index data.

    result_list = [
        ['name_123', '00:00:A1', 8.23, '0.15', '55541-00:AA:11:BB:22:CC:33:DD'],
        ['name_123', '00:00:A1', 8.23, '0.15'],
        ['site3_name-DB', '00:01:B2', 124.03, '46.72', '86753-00:AA:22:CD:F8:63:D2:B3'],
        ['LOG-SITE2_DB', '00:00:B3', 32.09, '20.34', '22234-00:AA:11:BB:CC:33:DD:44'],
        ['LOG-SITE2_DB', '00:00:B3', 32.09, '20.34'],
        ]
    
    for columns in result_list:
        if len(columns) > 4:
            columns = columns[:3] + columns[4].split('-')
    
    # [['name_123', '00:00:A1', 8.23, '0.15', '55541-00:AA:11:BB:22:CC:33:DD'],
    #  ['name_123', '00:00:A1', 8.23, '0.15'],
    #  ['site3_name-DB', '00:01:B2', 124.03, '46.72', '86753-00:AA:22:CD:F8:63:D2:B3'],
    #  ['LOG-SITE2_DB', '00:00:B3', 32.09, '20.34', '22234-00:AA:11:BB:CC:33:DD:44'],
    #  ['LOG-SITE2_DB', '00:00:B3', 32.09, '20.34']]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search