对于一些背景纯色,结构相对简单的图,可以利用传统的opencv图像处理进行分割。先来记录一下基于二值化图像素投影的图片切割方法的实现。比如下面这张图,可以利用这个算法进行切割。(源代码在最后面)

切割后的效果

 思路:对于背景为白色的图片,可以分别统计每一行和每一列的黑像素点的个数,获得水平和垂直方向累计黑点个数的列表,如果列表中某个元素的值为0,代表这一行或这一列没有黑色像素,可以认为这一行或这一列是背景,切割时利用列表相邻两个元素是否为0和非零,确定切割边界。黑色背景的图也可以用这个思路,这次笔记以白色背景图为例。

首先要对图片进行二值化,我选用自适应二值化方法,选择其他二值化方法也可以,但自适应二值化方法有个好处是只要把常数设置成大于0,不管黑色背景还是白色背景的图都能变成白色背景。

def adaptive_threshold(gray, blockSize=5, C=10, inv=False):
    if inv == False:
        thresholdType = cv2.THRESH_BINARY
    else:
        thresholdType = cv2.THRESH_BINARY_INV
    # 自适应阈值化能够根据图像不同区域亮度分布,改变阈值
    binary_img = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, thresholdType, blockSize, C)
    return binary_img

if __name__ == "__main__":
    img_path = 'xxx.jpg'
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img , cv2.COLOR_BGR2GRAY)

    binary_img = adaptive_threshold(gray, blockSize=15)

得到二值化图像后,就需要计算水平或垂直方向的投影了,方法简单粗暴,遍历每个像素点

def get_projection_list_demo(binary_img):
    h, w = binary_img.shape[:2]
    row_list = [0] * h
    col_list = [0] * w
    for row in range(h):
        for col in range(w):
            if binary_img[row, col] == 0:
                row_list[row] = row_list[row] + 1
                col_list[col] = col_list[col] + 1

    # 显示水平投影
    temp_img_1 = 255 - np.zeros((binary_img.shape[0], max(row_list)))
    for row in range(h):
        for i in range(row_list[row]):
            temp_img_1[row, i] = 0
    cv2.imshow('horizontal', temp_img_1)

    # 显示垂直投影
    temp_img_2 = 255 - np.zeros((max(col_list), binary_img.shape[1]))
    for col in range(w):
        for i in range(col_list[col]):
            temp_img_2[i, col] = 0
    cv2.imshow('vertical', temp_img_2)

根据水平投影图可以很明显地分割出每一行,而垂直投影图则没有明显的价值,但是如果把图片的每一行拿出来做垂直投影,就能分割出每一行的文字块。按照这个思路,需要把这张图先进行水平切分,切完后的每个水平图片再进行垂直切分,最后得到的每个区域差不多就是我们想要的目标区域集了。

因此需要先把上面这幅图的水平投影列表进行提取每个区域的起始和终止位置。

def split_projection_list(projectionList: list, minValue=0):
    start = 0
    end = None

    split_list = []
    for idx, value in enumerate(projectionList):
        if value > minValue:
            end = idx
        else:
            if end is not None:
                split_list.append((start, end))
                end = None
            start = idx
    else:
        if end is not None:
            split_list.append((start, end))
            end = None
    return split_list

上述方法返回一个列表,代表每个黑色联通区域的起始和结束位置。

利用这个列表,就能在图上画出第一次水平切割的效果了。

def get_projection_list(binary_img, direction='horizontal'):
    h, w = binary_img.shape[:2]
    row_list = [0] * h
    col_list = [0] * w
    for row in range(h):
        for col in range(w):
            if binary_img[row, col] == 0:
                row_list[row] = row_list[row] + 1
                col_list[col] = col_list[col] + 1
    if direction == 'horizontal':
        return row_list
    else:
        return col_list


img_h, img_w = binary_img.shape[:2]
projection_list = get_projection_list(binary_img, direction='horizontal')
split_list = split_projection_list(projection_list, minValue=0)
for start, end in split_list:
    x, y, w, h = 0, start, img_w, end - start
    p1 = (x, y)
    p2 = (x + w, y + h)

    cv2.rectangle(binary_img, p1, p2, 0)

cv2.imshow('split', binary_img)

接下来对每一行再次进行垂直投影分割。考虑到通用性,切割算法可以抽象为一个函数,能够进行水平垂直交替递归切割,direction定义第一次切割方向,iteration代表交替切割次数。例如direction=horizontal,iteration=3,代表水平->垂直->水平切割。

def cut_binary_img(binary_img, direction='horizontal', iteration=2):
    if iteration <= 0:
        return
    img_h, img_w = binary_img.shape[:2]
    projection_list = get_projection_list(binary_img, direction)
    split_list = split_projection_list(projection_list, minValue=0)

    for start, end in split_list:
        if direction == 'horizontal':
            x, y, w, h = 0, start, img_w, end - start
        else:
            x, y, w, h = start, 0, end - start, img_h

        roi = binary_img[y:y+h, x:x+w]
        if direction == 'horizontal':
            next_direction = 'vertical'
        else:
            next_direction = 'horizontal'
         cut_binary_img(roi, next_direction, iteration - 1)

按照上述方法水平->垂直切割效果图

垂直切割的确起效果了,但是 并不是我们想要的,首先字被一个一个单独切割,其次原图上的虚线框也被切出来了。

虚线框的高度或宽度很小,可以通过投影黑色联通区域的起始和结束差值过滤掉,其次文字可以使用腐蚀让相邻的文字连通起来。因此需要修改算法。

# 对二值图进行腐蚀
binary_img = adaptive_threshold(gray, blockSize=15)

# 增大横向腐蚀
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 1))
# 腐蚀后的图代替原来的二值图
erode_img = cv2.erode(binary_img, kernel)
def cut_binary_img(binary_img, direction='horizontal', iteration=2):
    if iteration <= 0:
        return
    img_h, img_w = binary_img.shape[:2]

    projection_list = get_projection_list(binary_img, direction)
    split_list = split_projection_list(projection_list, minValue=0)
    for start, end in split_list:
        if end - start < 5: # 过滤虚线框
            continue
        if direction == 'horizontal':
            x, y, w, h = 0, start, img_w, end - start
        else:
            x, y, w, h = start, 0, end - start, img_h

        roi = binary_img[y:y+h, x:x+w]
        if direction == 'horizontal':
            next_direction = 'vertical'
        else:
            next_direction = 'horizontal'
        cut_binary_img(roi, next_direction, iteration - 1)

但是可以看到腐蚀的图像上有很多噪点被放大了,可能会影响投影结果

因此可以对图像事先进行去噪处理,这里对原图用了高斯滤波,得到去噪后的腐蚀图。

img = cv2.imread(img_path)
blurred = cv2.GaussianBlur(img, (3, 3), 0) # 高斯滤波
gray = cv2.cvtColor(blurred, cv2.COLOR_BGR2GRAY)

 

 可以看到噪声基本没有了,切割看下效果

基本上实现了理想的切割效果,再换一下其他图 

效果还行, 再来看这张图

这张图不同于第一张,因为多了右边图片这一块,因此需要先垂直切割,在进行水平切割,最后再次垂直切割,迭代三次,因此修改调用。看一下效果。

H = 'horizontal'
V =  'vertical'
cut_binary_img(erode_img, 0, 0, direction=V, iteration=3)

 

果然不行了,上半部分无法更进一步分割。原因是腐蚀图的文字和图片联通在一起了。

作为解决方案,需要让联通在一起的地方不能认为是一个联通区域,修改函数split_projection_list的minValue,我使用平均值来算最小阈值。

minValue = int(0.1 * sum(projection_list) / len(projection_list))
split_list = split_projection_list(projection_list, minValue)

最后效果

有点小瑕疵,但总体还行。

最后总结

基于二值化图像素投影的图片切割方法适用于结构简单(纯色背景,有明显水平或垂直结构)的图片,但需要事先知道图片第一次的切割方向以及迭代切割次数,另外过程中一些参数的定义也比较麻烦,因此通用性较差。接下来会陆续分享其他图像切割的方法。有错的地方请各位大佬帮忙指正。

源代码

import cv2
import numpy as np
from lib import * #自定义库,目前只用到里面的自适应二值化算法,算法实现在最开头已给出
import random

def get_projection_list_demo(binary_img):
    h, w = binary_img.shape[:2]
    row_list = [0] * h
    col_list = [0] * w
    for row in range(h):
        for col in range(w):
            if binary_img[row, col] == 0:
                row_list[row] = row_list[row] + 1
                col_list[col] = col_list[col] + 1

    temp_img_1 = 255 - np.zeros((binary_img.shape[0], max(row_list)))
    for row in range(h):
        for i in range(row_list[row]):
            temp_img_1[row, i] = 0
    cv2.imshow('horizontal', temp_img_1)

    temp_img_2 = 255 - np.zeros((max(col_list), binary_img.shape[1]))
    for col in range(w):
        for i in range(col_list[col]):
            temp_img_2[i, col] = 0
    cv2.imshow('vertical', temp_img_2)


def get_projection_list(binary_img, direction='horizontal'):
    h, w = binary_img.shape[:2]
    row_list = [0] * h
    col_list = [0] * w
    for row in range(h):
        for col in range(w):
            if binary_img[row, col] == 0:
                row_list[row] = row_list[row] + 1
                col_list[col] = col_list[col] + 1
    if direction == 'horizontal':
        return row_list
    else:
        return col_list


def split_projection_list(projectionList: list, minValue=0):
    start = 0
    end = None

    split_list = []
    for idx, value in enumerate(projectionList):
        if value > minValue:
            end = idx
        else:
            if end is not None:
                split_list.append((start, end))
                end = None
            start = idx
    else:
        if end is not None:
            split_list.append((start, end))
            end = None
    return split_list


def cut_binary_img(binary_img, startX, startY, direction='horizontal', iteration=2):
    img_h, img_w = binary_img.shape[:2]
    if iteration <= 0:
        return {
        'rect': (startX, startY, img_w, img_h),
        'childern': None
    }

    children = []

    projection_list = get_projection_list(binary_img, direction)
    minValue = int(0.1 * sum(projection_list) / len(projection_list))
    # print(minValue)
    split_list = split_projection_list(projection_list, minValue)
    for start, end in split_list:
        if end - start < 5:
            continue
        if direction == 'horizontal':
            x, y, w, h = 0, start, img_w, end - start
        else:
            x, y, w, h = start, 0, end - start, img_h

        roi = binary_img[y:y+h, x:x+w]
        if direction == 'horizontal':
            next_direction = 'vertical'
        else:
            next_direction = 'horizontal'
        grandchildren = cut_binary_img(roi, startX + x, startY + y, next_direction, iteration - 1)

        children.append(grandchildren)

    root = {
        'rect': (startX, startY, img_w, img_h),
        'childern': children
    }
    return root

def get_leaf_node(root):
    leaf_rects = []
    if root['childern'] is None:
        leaf_rect = root['rect']
        leaf_rects.append(leaf_rect)
    else:
        for childern in root['childern']:
            rects = get_leaf_node(childern)
            leaf_rects.extend(rects)
    return leaf_rects

def draw_rects(img, rects):
    new_img = img.copy()
    for x, y, w, h in rects:
        p1 = (x, y)
        p2 = (x + w, y + h)
        
        color = (random.randint(0,255), random.randint(0,255), random.randint(0,255))
        # color = (0, 0, 255)
        cv2.rectangle(new_img, p1, p2, color, 2)
    return new_img

if __name__ == "__main__":
    img_path = 'xxx.jpg'
    img = cv2.imread(img_path)
    blurred = cv2.GaussianBlur(img, (3, 3), 0)
    gray = cv2.cvtColor(blurred, cv2.COLOR_BGR2GRAY)

    binary_img = adaptive_threshold(gray, blockSize=15)
    # binary_img = threshold(gray)

    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 1))
    erode_img = cv2.erode(binary_img, kernel)

    H = 'horizontal'
    V =  'vertical'
    root = cut_binary_img(erode_img, 0, 0, direction=V, iteration=3)

    rects = get_leaf_node(root)
    new_img = draw_rects(img, rects)

    # get_projection_list_demo(binary_img)

    cv2.imshow('new_img', new_img)
    cv2.imshow('src', img)
    cv2.imshow('erode_img', erode_img)
    cv2.imshow('binary_img', binary_img)
    cv2.waitKey(0)

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐