I have a list
of dict
that follow a consistent structure where each dict
has a list
of integers. However, I need to make sure each dict
has a bytesize (when converted to a JSON string) less than a specified threshold.
If the dict
exceeds that bytesize threshold, I need to chunk that dict’s integer list
.
Attempt:
import json
payload: list[dict] = [
{"data1": [1,2,3,4]},
{"data2": [8,9,10]},
{"data3": [1,2,3,4,5,6,7]}
]
# Max size in bytes we can allow. This is static and a hard limit that is not variable.
MAX_SIZE: int = 25
def check_and_chunk(arr: list):
def check_size_bytes(item):
return True if len(json.dumps(item).encode("utf-8")) > MAX_SIZE else False
def chunk(item, num_chunks: int=2):
for i in range(0, len(item), num_chunks):
yield item[i:i+num_chunks]
# First check if the entire payload is smaller than the MAX_SIZE
if not check_size_bytes(arr):
return arr
# Lets find the items that are small and items that are too big, respectively
small, big = [], []
# Find the indices in the payload that are too big
big_idx: list = [i for i, j in enumerate(list(map(check_size_bytes, arr))) if j]
# Append these items respectively to their proper lists
item_append = (small.append, big.append)
for i, item in enumerate(arr):
item_append[i in set(big_idx)](item)
# Modify the big items until they are small enough to be moved to the small_items list
for i in big:
print(i)
# This is where I am unsure of how best to proceed. I'd like to essentially split the big dictionaries in the 'big' list such that it is small enough where each element is in the 'small' result.
Example of a possible desired result:
payload: list[dict] = [
{"data1": [1,2,3,4]},
{"data2": [8,9,10]},
{"data3": [1,2,3,4]},
{"data3": [5,6,7]}
]
2
Answers
IIUC, you can use generator to yield the chunks of right size:
Prints:
My approach starts with the list of integers. I will take one out of the existing list (which I call
input_sequence
) and place into a new list (output_sequence
) until I go over the length limit. At which point, I will back up one number and build the "chunk".Here is the output.
Notes
logging.DEBUG
withlogging.WARN
check_and_chunk
to add the size limit and not relying on the global vardeque
data structure, which behaves like a list, but with faster insert/remove from the left.