BeautifulSoup 计数标签而不深入解析它们

BeautifulSoup counting tags without parsing deep inside them(BeautifulSoup 计数标签而不深入解析它们)
本文介绍了BeautifulSoup 计数标签而不深入解析它们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!



I thought about the following while writing an answer to this question.

假设我有一个像这样深度嵌套的 xml 文件(但嵌套更多且更长):

Suppose I have a deeply nested xml file like this (but much more nested and much longer):

<section name="1">
    <subsection name"foo">
        <subsubsection name="bar">
            <deeper name="hey">
                <much_deeper name"yo">
                    <li>Some content</li>
<section name="2">
    ... and so forth

len(soup.find_all("section")) 的问题在于,在执行 find_all("section") 时,BS 一直在深入搜索一个标签我知道不会包含任何其他 section 标记.

The problem with len(soup.find_all("section")) is that while doing find_all("section"), BS keeps searching deep into a tag that I know won't contain any other section tag.


  1. 有没有办法让 BS 递归搜索到已经找到的标签?
  2. 如果对 1 的回答是肯定的,是效率更高还是内部流程相同?
  1. Is there a way to make BS not search recursively into an already found tag?
  2. If the answer to 1 is yes, will it be more efficient or is it the same internal process?


BeautifulSoup 不能只提供它找到的标签的计数/数量.

BeautifulSoup cannot give you just a count/number of tags it found.

不过,您可以改进的是:不要让 BeautifulSoup 通过传递 recursive=False 来搜索其他部分中的部分:

What you, though, can improve is: don't let BeautifulSoup go searching sections inside other sections by passing recursive=False:

len(soup.find_all("section", recursive=False))

除了改进之外,lxml 会更快地完成这项工作:

Aside from that improvement, lxml would do the job faster:


这篇关于BeautifulSoup 计数标签而不深入解析它们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!



python arbitrarily incrementing an iterator inside a loop(python在循环内任意递增迭代器)
Joining a set of ordered-integer yielding Python iterators(加入一组产生 Python 迭代器的有序整数)
Iterating over dictionary items(), values(), keys() in Python 3(在 Python 3 中迭代字典 items()、values()、keys())
What is the Perl version of a Python iterator?(Python 迭代器的 Perl 版本是什么?)
How to create a generator/iterator with the Python C API?(如何使用 Python C API 创建生成器/迭代器?)
Python generator behaviour(Python 生成器行为)