索引地址:系列索引
OCR测试经过鼠标操作、形态学处理等等之后,我们来进行一个大一点的技术测试。
首先,准备一张全是文字的图片。(这是百度百科幽弥狂的介绍中的一部分)
我们只是入门了OpenCV,所以就不做骚操作,首先测试的是全文字。
OCR的基础流程是:
需要OCR的源文件(PNG图片) OpenCV读取源文件(imread()) 灰度化图片(cvtColor()) 二值化图片(使用阈值进行二值化) 将黑白图片切割为最小化的单个图片 识别每个图片的结果 整合结果 格式化结果 输出结果 我们此篇文章只切割到单个图片就结束。
那就按照流程:
源文件就不需要多说了,随便截一张全是文字的图片就可以了。
读取文件读取文件就是读取测试图片了
1 cv::imread ("opencv_ocr_src.png" );
效果如图:
灰度化图片灰度化有两种:一是在读取的时候设置读取参数,二是源图片转换
这里使用的是第二种:
1 2 3 4 5 6 7 8 9 10 cv::Mat getGrayImg (cv::Mat mImg) { cv::Mat grayImg;if (mImg.channels () == 3 ) { cv::cvtColor (mImg, grayImg, cv::COLOR_BGR2GRAY); } else { grayImg = mImg; }return grayImg; }
输入原图,通道有一通道、三通道、四通道,如果不是三通道就肯定是一通道,要么是灰度图要么是黑白图就不用处理了(四通道暂时不考虑),输出是灰度图。
效果如图:
二值化图片将上面的灰度图通过阈值转换为二值图,方式很多。我选用的是:计算所有灰度图的像素值,去平均值,以此为阈值二值化。如果结果图片中白色较多,就翻转图片,保持黑色像素多于白色像素。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 cv::Mat getBinImg (cv::Mat grayImg,int threholdValue=-1 ) { cv::Mat result; int middle = 0 ; if (threholdValue == -1 ) { uchar * p; int nrows = grayImg.rows; int ncols = grayImg.cols; long long int count = 0 ; int mid = 0 ; for (int i = 0 ; i < nrows; i++) { p = grayImg.ptr <uchar>(i); for (int j = 0 ; j < ncols; j++) { mid += p[ j ]; count++; } } middle = int ((mid / count) * 2 / 3 ); } else { middle = threholdValue; } std::cout << "Average pixel is: " << middle<<std::endl; threshold (grayImg, result, middle, 255 , cv::THRESH_BINARY); int white = 0 ; int black = 0 ; for (int i = 0 ; i < result.rows; i++) { uchar *ptr = result.ptr (i); for (int j = 0 ; j < result.cols; j++) { if (ptr[ j ] == 0 ) black++; else white++; } } if (black >= white) result = ~result; return result; }
效果如图:
切割图片 x轴投影从图片的x方向来看,每一行中间都有间隔。如果向右侧投影,有文字的部分投影为黑色,无文字的部分投影为白色。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 std::vector<cv::Mat> getShadowXResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat painty (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.rows ]; memset (pointcount, 0 , bin.rows * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ i ]++; } } } for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { painty.at <uchar>(i, j) = 0 ; } } cv::imwrite ("x.png" ,painty); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < painty.rows; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.rowRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }
效果如图:
从第0行开始至最后一行,碰到黑色像素就记录位置px1,如果碰到白色像素值就记录位置px2。根据统一的宽度w,我们可以框出其子矩形部分(就是一行文字)(0,px1,px2-px1,w)。
y轴投影对于每一行文字,每个文字之间都有间隔。将每一行的文字像素向y轴投影,就会得到类似于柱形图的像素分布图。根据x轴投影的效果,我们按照黑白像素的间隔进行切割,就会切割出单个字符了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 std::vector<cv::Mat> getShadowYResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat paintx (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.cols ]; memset (pointcount, 0 , bin.cols * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ j ]++; } } } for (int i = 0 ; i < bin.cols; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { paintx.at <uchar>(bin.rows - 1 - j, i) = 0 ; } } cv::imwrite ("y.png" ,paintx); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < paintx.cols; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.colRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }
投影效果如图:
对于每一列,如果是黑色部分就记录位置py1,如果是白色部分就表示此时文字结束了,位置为py2。一行文字的高度是固定的h(因为图片高度是固定的),那么单个字符的子框为(py1,0,py2-py1,h)。
上面代码是切割一行,对于多行文字需要不断操作。对于每一行都进行y轴切割。
1 2 3 4 5 6 7 8 9 10 11 std::vector<cv::Mat> xresult = getShadowXResult (binImg);for (int i=0 ;i<static_cast <int >(xresult.size ());i++){ cv::Mat temp = xresult.at (i); std::vector<cv::Mat> yresult = getShadowYResult (temp); for (int j=0 ;j<static_cast <int >(yresult.size ());j++){ std::string filePath = "/home/jackey/Downloads/result/" +std::to_string (count++)+".png" ; std::cout<<filePath<<std::endl; cv::imwrite (filePath,yresult.at (j)); } }
结果结果行和列切割之后,我们就得到文字图片中的单个字符的图片。
完整代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 #include "opencv2/opencv.hpp" #include "assert.h" #include "vector" #include "iostream" cv::Mat getGrayImg (cv::Mat mImg) { cv::Mat grayImg; if (mImg.channels () == 3 ) { cv::cvtColor (mImg, grayImg, cv::COLOR_BGR2GRAY); } else { grayImg = mImg; } return grayImg; }cv::Mat getBinImg (cv::Mat grayImg,int threholdValue=-1 ) { cv::Mat result; int middle = 0 ; if (threholdValue == -1 ) { uchar * p; int nrows = grayImg.rows; int ncols = grayImg.cols; long long int count = 0 ; int mid = 0 ; for (int i = 0 ; i < nrows; i++) { p = grayImg.ptr <uchar>(i); for (int j = 0 ; j < ncols; j++) { mid += p[ j ]; count++; } } middle = int ((mid / count) * 2 / 3 ); } else { middle = threholdValue; } std::cout << "Average pixel is: " << middle<<std::endl; threshold (grayImg, result, middle, 255 , cv::THRESH_BINARY); int white = 0 ; int black = 0 ; for (int i = 0 ; i < result.rows; i++) { uchar *ptr = result.ptr (i); for (int j = 0 ; j < result.cols; j++) { if (ptr[ j ] == 0 ) black++; else white++; } } if (black >= white) result = ~result; return result; }std::vector<cv::Mat> getShadowXResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat painty (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.rows ]; memset (pointcount, 0 , bin.rows * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ i ]++; } } } for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { painty.at <uchar>(i, j) = 0 ; } } cv::imwrite ("x.png" ,painty); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < painty.rows; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.rowRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }std::vector<cv::Mat> getShadowYResult (cv::Mat bin) { assert (!bin.empty ()); cv::Mat paintx (bin.size(), CV_8UC1, cv::Scalar(255 )) ; int *pointcount = new int [ bin.cols ]; memset (pointcount, 0 , bin.cols * 4 ); for (int i = 0 ; i < bin.rows; i++) { for (int j = 0 ; j < bin.cols; j++) { if (bin.at <uchar>(i, j) == 0 ) { pointcount[ j ]++; } } } for (int i = 0 ; i < bin.cols; i++) { for (int j = 0 ; j < pointcount[ i ]; j++) { paintx.at <uchar>(bin.rows - 1 - j, i) = 0 ; } } cv::imwrite ("y.png" ,paintx); std::vector<cv::Mat> result; int startindex = 0 ; int endindex = 0 ; bool inblock = false ; for (int i = 0 ; i < paintx.cols; i++) { if (!inblock && pointcount[ i ] != 0 ) { inblock = true ; startindex = i; } if (inblock && pointcount[ i ] == 0 ) { endindex = i; inblock = false ; cv::Mat roi = bin.colRange (startindex, endindex + 1 ); result.push_back (roi); } } delete [] pointcount; return result; }int main () { int count=0 ; cv::Mat mImg=cv::imread ("opencv_ocr_src.png" ); cv::imwrite ("src.png" ,mImg); if (mImg.empty ()){ std::cout<<"Reading image failed" <<std::endl; } cv::Mat grayImg = getGrayImg (mImg); cv::imwrite ("gray.png" ,grayImg); assert (!grayImg.empty ()); cv::Mat binImg = getBinImg (grayImg); cv::imwrite ("bin.png" ,binImg); assert (!binImg.empty ()); std::vector<cv::Mat> xresult = getShadowXResult (binImg); for (int i=0 ;i<static_cast <int >(xresult.size ());i++){ cv::Mat temp = xresult.at (i); std::vector<cv::Mat> yresult = getShadowYResult (temp); for (int j=0 ;j<static_cast <int >(yresult.size ());j++){ std::string filePath = "/home/jackey/Downloads/result/" +std::to_string (count++)+".png" ; std::cout<<filePath<<std::endl; cv::imwrite (filePath,yresult.at (j)); } } }
问题"
是一个字符表示双引号的左半部分,但是因为中间有白色像素被切割为两个字符只能切割纯文字图片 现在人工智能很火,最好的方法是用神经网络、深度学习进行训练字符分类、切割。
如果我测试过,后期会更新。