索引地址:系列索引
OCR测试经过鼠标操作、形态学处理等等之后,我们来进行一个大一点的技术测试。
首先,准备一张全是文字的图片。(这是百度百科幽弥狂的介绍中的一部分)
我们只是入门了OpenCV,所以就不做骚操作,首先测试的是全文字。
OCR的基础流程是:
需要OCR的源文件(PNG图片) OpenCV读取源文件(imread()) 灰度化图片(cvtColor()) 二值化图片(使用阈值进行二值化) 将黑白图片切割为最小化的单个图片 识别每个图片的结果 整合结果 格式化结果 输出结果 我们此篇文章只切割到单个图片就结束。
那就按照流程:
源文件就不需要多说了,随便截一张全是文字的图片就可以了。
读取文件读取文件就是读取测试图片了
1 cv::imread ("opencv_ocr_src.png" );
效果如图:
灰度化图片灰度化有两种:一是在读取的时候设置读取参数,二是源图片转换
这里使用的是第二种:
1 2 3 4 5 6 7 8 9 10 cv::Mat getGrayImg (cv::Mat mImg) { cv::Mat grayImg;if (mImg.channels () == 3 ) { cv::cvtColor (mImg, grayImg, cv::COLOR_BGR2GRAY); } else { grayImg = mImg; }return grayImg; }
输入原图,通道有一通道、三通道、四通道,如果不是三通道就肯定是一通道,要么是灰度图要么是黑白图就不用处理了(四通道暂时不考虑),输出是灰度图。
效果如图:
二值化图片将上面的灰度图通过阈值转换为二值图,方式很多。我选用的是:计算所有灰度图的像素值,去平均值,以此为阈值二值化。如果结果图片中白色较多,就翻转图片,保持黑色像素多于白色像素。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 cv::Mat getBinImg (cv::Mat grayImg,int threholdValue=-1 ) { cv::Mat result; int middle = 0 ; if (threholdValue == -1 ) { uchar * p; int nrows = grayImg.rows; int ncols = grayImg.cols; long long int count = 0 ; int mid = 0 ; for (int i = 0 ; i < nrows; i++) { p = grayImg.ptr <uchar>(i); for (int j = 0 ; j < ncols; j++) { mid += p[ j ]; count++; } } middle = int ((mid / count) * 2 / 3 ); } else { middle = threholdValue; } std::cout << "Average pixel is: " << middle<<std::endl; threshold (grayImg, result, middle, 255 , cv::THRESH_BINARY); int white = 0 ; int black = 0 ; for (int i = 0 ; i < result.rows; i++) { uchar *ptr = result.ptr (i); for (int j = 0 ; j < result.cols; j++) { if (ptr[ j ] == 0 ) black++; else white++; } } if (black >= white) result = ~result; return result; }
效果如图:
切割图片 x轴投影从图片的x方向来看,每一行中间都有间隔。如果向右侧投影,有文字的部分投影为黑色,无文字的部分投影为白色。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 std::vector<cv::Mat> getShadowXResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat painty (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.rows ]; memset (pointcount, 0 , bin.rows * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ i ]++; } } } for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { painty.at <uchar>(i, j) = 0 ; } } cv::imwrite ("x.png" ,painty); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < painty.rows; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.rowRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }
效果如图:
从第0行开始至最后一行,碰到黑色像素就记录位置px1,如果碰到白色像素值就记录位置px2。根据统一的宽度w,我们可以框出其子矩形部分(就是一行文字)(0,px1,px2-px1,w)。
y轴投影对于每一行文字,每个文字之间都有间隔。将每一行的文字像素向y轴投影,就会得到类似于柱形图的像素分布图。根据x轴投影的效果,我们按照黑白像素的间隔进行切割,就会切割出单个字符了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 std::vector<cv::Mat> getShadowYResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat paintx (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.cols ]; memset (pointcount, 0 , bin.cols * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ j ]++; } } } for (int i = 0 ; i < bin.cols; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { paintx.at <uchar>(bin.rows - 1 - j, i) = 0 ; } } cv::imwrite ("y.png" ,paintx); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < paintx.cols; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.colRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }
投影效果如图:
对于每一列,如果是黑色部分就记录位置py1,如果是白色部分就表示此时文字结束了,位置为py2。一行文字的高度是固定的h(因为图片高度是固定的),那么单个字符的子框为(py1,0,py2-py1,h)。
上面代码是切割一行,对于多行文字需要不断操作。对于每一行都进行y轴切割。
1 2 3 4 5 6 7 8 9 10 11 std::vector<cv::Mat> xresult = getShadowXResult (binImg);for (int i=0 ;i<static_cast <int >(xresult.size ());i++){ cv::Mat temp = xresult.at (i); std::vector<cv::Mat> yresult = getShadowYResult (temp); for (int j=0 ;j<static_cast <int >(yresult.size ());j++){ std::string filePath = "/home/jackey/Downloads/result/" +std::to_string (count++)+".png" ; std::cout<<filePath<<std::endl; cv::imwrite (filePath,yresult.at (j)); } }
结果结果行和列切割之后,我们就得到文字图片中的单个字符的图片。
完整代码:
include "opencv2/opencv.hpp" #include "assert.h" #include "vector" #include "iostream" cv::Mat getGrayImg (cv::Mat mImg) { cv::Mat grayImg; if (mImg.channels () == 3 ) { cv::cvtColor (mImg, grayImg, cv::COLOR_BGR2GRAY); } else { grayImg = mImg; } return grayImg; }cv::Mat getBinImg (cv::Mat grayImg,int threholdValue=-1 ) { cv::Mat result; int middle = 0 ; if (threholdValue == -1 ) { uchar * p; int nrows = grayImg.rows; int ncols = grayImg.cols; long long int count = 0 ; int mid = 0 ; for (int i = 0 ; i < nrows; i++) { p = grayImg.ptr <uchar>(i); for (int j = 0 ; j < ncols; j++) { mid += p[ j ]; count++; } } middle = int ((mid / count) * 2 / 3 ); } else { middle = threholdValue; } std::cout << "Average pixel is: " << middle<<std::endl; threshold (grayImg, result, middle, 255 , cv::THRESH_BINARY); int white = 0 ; int black = 0 ; for (int i = 0 ; i < result.rows; i++) { uchar *ptr = result.ptr (i); for (int j = 0 ; j < result.cols; j++) { if (ptr[ j ] == 0 ) black++; else white++; } } if (black >= white) result = ~result; return result; }std::vector<cv::Mat> getShadowXResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat painty (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.rows ]; memset (pointcount, 0 , bin.rows * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ i ]++; } } } for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { painty.at <uchar>(i, j) = 0 ; } } cv::imwrite ("x.png" ,painty); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < painty.rows; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.rowRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }std::vector<cv::Mat> getShadowYResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat paintx (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.cols ]; memset (pointcount, 0 , bin.cols * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ j ]++; } } } for (int i = 0 ; i < bin.cols; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { paintx.at <uchar>(bin.rows - 1 - j, i) = 0 ; } } cv::imwrite ("y.png" ,paintx); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < paintx.cols; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.colRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }int main () { int count=0 ; cv::Mat mImg=cv::imread ("opencv_ocr_src.png" ); cv::imwrite ("src.png" ,mImg); if (mImg.empty ()){ std::cout<<"Reading image failed" <<std::endl; } cv::Mat grayImg = getGrayImg (mImg); cv::imwrite ("gray.png" ,grayImg); assert (!grayImg.empty ()); cv::Mat binImg = getBinImg (grayImg); cv::imwrite ("bin.png" ,binImg); assert (!binImg.empty ()); std::vector<cv::Mat> xresult = getShadowXResult (binImg); for (int i=0 ;i<static_cast <int >(xresult.size ());i++){ cv::Mat temp = xresult.at (i); std::vector<cv::Mat> yresult = getShadowYResult (temp); for (int j=0 ;j<static_cast <int >(yresult.size ());j++){ std::string filePath = "/home/jackey/Downloads/result/" +std::to_string (count++)+".png" ; std::cout<<filePath<<std::endl; cv::imwrite (filePath,yresult.at (j)); } } }
问题"
是一个字符表示双引号的左半部分,但是因为中间有白色像素被切割为两个字符只能切割纯文字图片 现在人工智能很火,最好的方法是用神经网络、深度学习进行训练字符分类、切割。
如果我测试过,后期会更新。