如何在 Python OpenCV 中检测文本文档图像中的段落是否存在不一致的文本结构

How to detect paragraphs in a text document image for a non-consistent text structure in Python OpenCV(如何在 Python OpenCV 中检测文本文档图像中的段落是否存在不一致的文本结构)
本文介绍了如何在 Python OpenCV 中检测文本文档图像中的段落是否存在不一致的文本结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我试图通过首先将其转换为图像然后使用 OpenCV 来识别 .pdf 文档中的文本段落.但是我在文本行而不是段落上得到边界框.如何设置一些阈值或其他限制来获取段落而不是行?

这是示例输入图像:

这是我为上述示例得到的输出:

我试图在中间的段落上设置一个边界框.我正在使用

这就是魔法发生的地方.我们可以假设一个段落是一段紧密相连的单词,为了实现这一点,我们将相邻的单词进行扩张

结果

导入 cv2将 numpy 导入为 np# 加载图像,灰度,高斯模糊,Otsu的阈值图像 = cv2.imread('1.png')灰色 = cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(灰色, (7,7), 0)thresh = cv2.threshold(模糊, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]# 创建矩形结构元素并扩张内核 = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))dilate = cv2.dilate(阈值,内核,迭代=4)# 查找轮廓并绘制矩形cnts = cv2.findContours(扩张,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否则 cnts[1]对于 cnts 中的 c:x,y,w,h = cv2.boundingRect(c)cv2.rectangle(图像, (x, y), (x + w, y + h), (36,255,12), 2)cv2.imshow('thresh', thresh)cv2.imshow('扩张',扩张)cv2.imshow('图像', 图像)cv2.waitKey()

I am trying to identify paragraphs of text in a .pdf document by first converting it into an image then using OpenCV. But I am getting bounding boxes on lines of text instead of paragraphs. How can I set some threshold or some other limit to get paragraphs instead of lines?

Here is the sample input image:

Here is the output I am getting for the above sample:

I am trying to get a single bounding box on the paragraph in the middle. I am using this code.

import cv2
import numpy as np

large = cv2.imread('sample image.png')
rgb = cv2.pyrDown(large)
small = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY)

# kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
kernel = np.ones((5, 5), np.uint8)
grad = cv2.morphologyEx(small, cv2.MORPH_GRADIENT, kernel)

_, bw = cv2.threshold(grad, 0.0, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 1))
connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel)

# using RETR_EXTERNAL instead of RETR_CCOMP
contours, hierarchy = cv2.findContours(connected.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
#For opencv 3+ comment the previous line and uncomment the following line
#_, contours, hierarchy = cv2.findContours(connected.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

mask = np.zeros(bw.shape, dtype=np.uint8)

for idx in range(len(contours)):
    x, y, w, h = cv2.boundingRect(contours[idx])
    mask[y:y+h, x:x+w] = 0
    cv2.drawContours(mask, contours, idx, (255, 255, 255), -1)
    r = float(cv2.countNonZero(mask[y:y+h, x:x+w])) / (w * h)

    if r > 0.45 and w > 8 and h > 8:
        cv2.rectangle(rgb, (x, y), (x+w-1, y+h-1), (0, 255, 0), 2)


cv2.imshow('rects', rgb)
cv2.waitKey(0)

解决方案

This is a classic use for dilate. Whenever you want to connect multiple items together, you can dilate them to join adjacent contours into a single contour. Here's a simple approach:

  • Convert image to grayscale and Gaussian blur
  • Otsu's threshold
  • Dilate to connect adjacent words together
  • Find contours and draw contours

Otsu's threshold

Here's where the magic happens. We can assume that a paragraph is a section of words that are close together, to achieve this we dilate to connect adjacent words

Result

import cv2
import numpy as np

# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=4)

# Find contours and draw rectangle
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()

这篇关于如何在 Python OpenCV 中检测文本文档图像中的段落是否存在不一致的文本结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Reading *.mhd/*.raw format in python(在 python 中读取 *.mhd/*.raw 格式)
Count number of cells in the image(计算图像中的单元格数)
How to get the coordinates of the bounding box in YOLO object detection?(YOLO物体检测中如何获取边界框的坐标?)
Divide an image into 5x5 blocks in python and compute histogram for each block(在 python 中将图像划分为 5x5 块并计算每个块的直方图)
Extract cow number from image(从图像中提取奶牛编号)
How to show the whole image when using OpenCV warpPerspective(使用 OpenCV warpPerspective 时如何显示整个图像)