如何将生成器用作具有多处理映射功能的可迭代对象

How to use a generator as an iterable with Multiprocessing map function(如何将生成器用作具有多处理映射功能的可迭代对象)
本文介绍了如何将生成器用作具有多处理映射功能的可迭代对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

当我使用生成器作为带有 multiprocessing.Pool.map 函数的可迭代参数时:

When I use a generator as an iterable argument with multiprocessing.Pool.map function:

pool.map(func, iterable=(x for x in range(10)))

func 被调用之前,生成器似乎已经完全耗尽.

It seems that the generator is fully exhausted before func is ever called.

我想生成每个项目并将其传递给每个进程,谢谢

I want to yield each item and pass it to each process, thanks

推荐答案

multiprocessing.map 在处理之前将没有 __len__ 方法的可迭代对象转换为列表.这样做是为了帮助计算块大小,池使用它来对工作参数进行分组并降低调度作业的往返成本.这不是最优的,尤其是当 chunksize 为 1 时,但由于 map 必须以一种或另一种方式耗尽迭代器,它通常不是一个重大问题.

multiprocessing.map converts iterables without a __len__ method to a list before processing. This is done to aid the calculation of chunksize, which the pool uses to group worker arguments and reduce the round trip cost of scheduling jobs. This is not optimal, especially when chunksize is 1, but since map must exhaust the iterator one way or the other, its usually not a significant issue.

相关代码在pool.py中.注意它对 len 的使用:

The relevant code is in pool.py. Notice its use of len:

def _map_async(self, func, iterable, mapper, chunksize=None, callback=None,
        error_callback=None):
    '''
    Helper function to implement map, starmap and their async counterparts.
    '''
    if self._state != RUN:
        raise ValueError("Pool not running")
    if not hasattr(iterable, '__len__'):
        iterable = list(iterable)

    if chunksize is None:
        chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
        if extra:
            chunksize += 1
    if len(iterable) == 0:
        chunksize = 0

这篇关于如何将生成器用作具有多处理映射功能的可迭代对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

build conda package from local python package(从本地 python 包构建 conda 包)
How can I see all packages that depend on a certain package with PIP?(如何使用 PIP 查看依赖于某个包的所有包?)
How to organize multiple python files into a single module without it behaving like a package?(如何将多个 python 文件组织到一个模块中而不像一个包一样?)
Check if requirements are up to date(检查要求是否是最新的)
How to upload new versions of project to PyPI with twine?(如何使用 twine 将新版本的项目上传到 PyPI?)
Why #egg=foo when pip-installing from git repo(为什么从 git repo 进行 pip 安装时 #egg=foo)