加入一组产生 Python 迭代器的有序整数

Joining a set of ordered-integer yielding Python iterators(加入一组产生 Python 迭代器的有序整数)
本文介绍了加入一组产生 Python 迭代器的有序整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

这是一个看似简单的问题:给定一个生成升序整数序列的迭代器列表,编写一个只生成每个序列中出现的整数的简洁生成器.

Here is a seemingly simple problem: given a list of iterators that yield sequences of integers in ascending order, write a concise generator that yields only the integers that appear in every sequence.

昨晚阅读了几篇论文后,我决定用 Python 编写一个完全最小的全文索引器,在这里看到(虽然那个版本现在很老了).

After reading a few papers last night, I decided to hack up a completely minimal full text indexer in Python, as seen here (though that version is quite old now).

我的问题在于 search() 函数,它必须遍历每个发布列表并仅生成每个列表中出现的文档 ID.从上面的链接可以看出,我当前的非递归工作"尝试很糟糕.

My problem is with the search() function, which must iterate over each posting list and yield only the document IDs that appear on every list. As you can see from the link above, my current non-recursive 'working' attempt is terrible.

示例:

postings = [[1,   100, 142, 322, 12312],
            [2,   100, 101, 322, 1221],
            [100, 142, 322, 956, 1222]]

应该让步:

[100, 322]

至少有一个优雅的递归函数解决方案,但如果可能的话,我想避免这种情况.但是,涉及嵌套生成器表达式、itertools 滥用或任何其他类型的代码高尔夫的解决方案非常受欢迎.:-)

There is at least one elegant recursive function solution to this, but I'd like to avoid that if possible. However, a solution involving nested generator expressions, itertools abuse, or any other kind of code golf is more than welcome. :-)

应该可以将函数安排为只需要与最小列表中的项目一样多的步骤,而无需将整组整数吸入内存.将来,这些列表可能会从磁盘中读取,并且大于可用的 RAM.

It should be possible to arrange for the function to only require as many steps as there are items in the smallest list, and without sucking the entire set of integers into memory. In future, these lists may be read from disk, and larger than available RAM.

在过去的 30 分钟里,我在舌尖有了一个想法,但我无法将其完全融入代码中.请记住,这只是为了好玩!

For the past 30 minutes I've had an idea on the tip of my tongue, but I can't quite get it into code. Remember, this is just for fun!

推荐答案

import heapq, itertools
def intersect(*its):
    for key, values in itertools.groupby(heapq.merge(*its)):
        if len(list(values)) == len(its):
            yield key

>>> list(intersect(*postings))
[100, 322]

这篇关于加入一组产生 Python 迭代器的有序整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

python arbitrarily incrementing an iterator inside a loop(python在循环内任意递增迭代器)
Iterating over dictionary items(), values(), keys() in Python 3(在 Python 3 中迭代字典 items()、values()、keys())
What is the Perl version of a Python iterator?(Python 迭代器的 Perl 版本是什么?)
How to create a generator/iterator with the Python C API?(如何使用 Python C API 创建生成器/迭代器?)
Python generator behaviour(Python 生成器行为)
Splitting a string into 2-letter segments(将字符串拆分为 2 个字母的段)