I am trying to implement starmap in a small python script I am writing. To do this I have been using the following answer to a stackoverflow post. However, during the implementation process I encountered an issue I could not explain. Below I have attached code reproducing the issue of my original code.
from itertools import repeat
import multiprocessing
# from stackexchange: https://stackoverflow.com/a/53173433/17456342
def starmap_with_kwargs(pool, fn, args_iter, kwargs_iter):
args_for_starmap = zip(repeat(fn), args_iter, kwargs_iter)
print(args_iter)
return pool.starmap(apply_args_and_kwargs, args_for_starmap)
def apply_args_and_kwargs(fn, args, kwargs):
print('test')
return fn(*args, **kwargs)
def func(path, dictArg, **kwargs):
for i in dictArg:
print(i['a'])
print(kwargs['yes'])
def funcWrapper(path, dictList, **kwargs):
args_iter = zip(repeat(path), dictList)
kwargs_iter = repeat(kwargs)
# list(args_iter)
pool = multiprocessing.Pool()
starmap_with_kwargs(pool, func, args_iter, kwargs_iter)
dictList = [{'a: 2'}, {'a': 65}, {'a': 213}, {'a': 3218}]
path = 'some/path/to/something'
funcWrapper(path, dictList, yes=1)
The issue is the following: if I run the code above I get a TypeError message I expect should happen (this error is fixed by removing the loop in func). However, if I include the line list(args_iter)
there is no error message and I have no idea why this happens, my issue then is why is there no error message when list(args_iter) is included?
I am using python 3.8.10 on ubuntu 20.04.6 LTS in WSL
Below I have attached the (expected) error message I get when I remove the line.
<zip object at 0x7fa1ec0b8340>
test
test
test
test
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "testing.py", line 67, in apply_args_and_kwargs
return fn(*args, **kwargs)
File "testing.py", line 71, in func
print(i['a'])
TypeError: string indices must be integers
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "testing.py", line 88, in <module>
funcWrapper(path, dictList, yes=1)
File "testing.py", line 82, in funcWrapper
starmap_with_kwargs(pool, func, args_iter, kwargs_iter)
File "testing.py", line 61, in starmap_with_kwargs
return pool.starmap(apply_args_and_kwargs, args_for_starmap)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
TypeError: string indices must be integers
2
Answers
Check variable dictArg type passed to func(path, dictArg, **kwargs):
Modify func to see arguments it gets:
func got: some/path/to/something {‘a: 2’} {‘yes’: 1}, dictArg iteration provides keys, string ‘a’ in your case.
This has nothing to do with multiprocessing. It has to do with how Python iterators work. They only work once, and after that they are empty.
zip
is iterator. You bind it to the variable args_iter here:That’s not necessarily a problem. But when you run this line:
The iterator executes and puts its items into a list. After you do that, the iterator is now exhausted. When you later pass it to
starmap_with_args
, it’s empty.If you comment out the line where you turn args_iter into a list, then of course the iterator doesn’t get used.
Check out this little script:
This will not print out anything at all. However, if you comment out the line
the script will then produce this output: