Detect horizontal blank lines in .pdf form image with OpenCV(使用 OpenCV 检测 .pdf 表单图像中的水平空白行)
问题描述
我有 .pdf 文件已转换为该项目的 .jpg 图像.我的目标是识别您通常会在 .pdf 表单中找到的空白(例如 ____________),这些空白指示用户填写某种信息的空间.我一直在使用 cv2.Canny() 和 cv2.HoughlinesP() 函数进行边缘检测.
I have .pdf files that have been converted to .jpg images for this project. My goal is to identify the blanks (e.g ____________) that you would generally find in a .pdf form that indicate a space for the user to sign of fill out some kind of information. I have been using edge detection with the cv2.Canny() and cv2.HoughlinesP() functions.
这工作得相当好,但有不少误报似乎不知从何而来.当我查看边缘"文件时,它会在其他单词周围显示一堆噪音.我不确定这种噪音是从哪里来的.
This works fairly well, but there are quite a few false positives that come about from seemingly nowhere. When I look at the 'edges' file it shows a bunch of noise around the other words. I'm uncertain where this noise comes from.
是否应该继续调整参数,还是有更好的方法来找到这些空白的位置?
Should I continue to tweak the parameters, or is there a better method to find the location of these blanks?
推荐答案
假设您要在 .pdf 表单上查找水平线,这里有一个简单的方法:
Assuming that you're trying to find horizontal lines on a .pdf form, here's a simple approach:
- 将图像转换为灰度和自适应阈值图像
- 构造特殊内核以仅检测水平线
- 执行形态转换
- 查找轮廓并在图像上绘制
使用此示例图片
转换为灰度和自适应阈值得到二值图像
Convert to grayscale and adaptive threshold to obtain a binary image
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
然后我们用 cv2.getStructuringElement() 创建一个内核,并进行形态变换以隔离水平线
Then we create a kernel with cv2.getStructuringElement() and perform morphological transformations to isolate horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
从这里我们可以使用 cv2.HoughLinesP() 来检测线条,但是由于我们已经对图像进行了预处理并隔离了水平线,所以我们可以找到轮廓并绘制结果
From here we can use cv2.HoughLinesP() to detect lines but since we have already preprocessed the image and isolated the horizontal lines, we can just find contours and draw the result
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (36,255,12), 3)
完整代码
import cv2
image = cv2.imread('2.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (36,255,12), 3)
cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()
这篇关于使用 OpenCV 检测 .pdf 表单图像中的水平空白行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 OpenCV 检测 .pdf 表单图像中的水平空白行
基础教程推荐
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
