表格文本图像中文字的提取算法被引量：1

Text Extraction Algorithms for Document Images Including Forms

下载PDF

导出

摘要提出了一种从含有表格的文本图像的页面中提取文字的算法。该算法通过模板扫描形成包围图像前景像素的矩形框 ,从而提取出前景像素 ,进而组合矩形框形成模式链。利用模式的最大黑游程、长、宽三个统计特征实现对模式的分类。实验结果表明 ,该算法不仅对普通的表格有效 ,而且还可以从倾斜的表格及流程图中成功地提取出文字。本算法只适用于二值图像。 A text extraction algorithm is proposed for document images including forms in the page. The foreground pixels are extracted with bounding boxes by mask scanning, then, pattern lists are formed by grouping the neighbored boxes according to a certain criteria. Features such as the maximum black run length, the height, and the width of the patterns are employed for the pattern classification. Experimental results show that the proposed algorithm can successfully tackle normal forms and extract text from skewed forms and the flow chart. The algorithm is valid only for binary document images.

作者王加俊李艳玲黄贤武何振亚

机构地区苏州大学电子信息学院东南大学无线电系

出处《数据采集与处理》 CSCD 2004年第4期381-385,共5页 Journal of Data Acquisition and Processing

基金国家自然科学基金 (3 0 3 0 0 0 88)资助项目江苏省教育厅自然科学基金 (L0 1 1 2 41 992 5 )资助项目。

关键词表格文本图像提取算法页面文字二值图像游程矩形中文实验结果 document image page segmentation pattern text extraction

分类号 TP391 [自动化与计算机技术—计算机应用技术] TP317 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Abele L, Wahl F, Scherl W. Procedures for an automated segmentation of text, graphic and halftone regions in documents [A]. Proceedings of the 2nd Seandinavian Conference on Image Analysis [C].Hellsinkii, Finland, 1981.177～182.
2Nagy G, Seth S C. Document analysis with an expert system[M]. E. S. Gelsema and L. N. Kanal (Editors), Pattern Recognition Practice, Elsevier Science Publishers B. V. (North-Holland), 1986.149～159.
3Strouthopoulos C, Papamarkos N. PLA using RLSA and a neural network[J]. Engineering Applications of Artificial Intelligence, 1999, 12(2):119～138.
4Kubota K, Iwaki O, Arakawa H. Document understanding system[A]. Proceedings of the 7th International Conference on Pattern Recognition[C]. Montreal, Canada, 1984. 612～614.
5Mitchell P E, Yan H. Document page segmentation based on pattern spread analysis[J]. Optical Engineer, 2000,39(3):724～734.
6Jain A K, Bhattacharjee S. Text segmentation using Gabor filters for automatic document processing[J].Machine Vision and Applications, 1992,5 (3): 169～184.
7Jain A K, Zhong Y. Page segmentation using texture analysis[J]. Pattern Recognition, 1996,29(5):743～770.
8Deng S, Latifi S, Regentova E. Document segmentation using polynomial spline wavelet[J]. Pattern Recognition, 2001,34(12): 2533～2545.
9王加俊,黄贤武,郭玮玮,仲兴荣.文本页面图像的图文分割与分类算法[J].中国图象图形学报（A辑）,2004,9(5):571-577. 被引量：4

二级参考文献8

1Abele L, Wahl F, Scherl W. Procedures for an automated segmentation of text, graphic and halftone regions in documents[A]. In: Proceedings of the 2nd Scandinavian Conference on Image analysis [C], Hellsinkii, Finland, 1981 : 177- 182.
2Nagy G, Seth S C. Document analysis with an expert system[A]. In.. E. S. Gelsema and L. N. Kanal (Editors),Pattern Recognition Practice , Elsevier Science Publishers B. V.(North-Holland), 1986:149-159.
3Strouthopoulos, C Papamarkos, N. PLA using RLSA and a neural network [J]. Engineering Applications of Artificial Intelligence, 1999,12(2) :119-138.
4Kubota K, Iwaki O, Arakawa H. Document understanding system[A]. In: Proceedings of the 7th International Conference On Pattern Recogniton[C], Montreal, Canada, 1984 : 612-614.
5Fletcher L A, Kasturi R A. A robust algorithm for text string separation from mixed text/graphic images[J]. IEEE Trans On Pattern Recognition and Machine Intelligence, 1998,10(6):910-918.
6Jain A K, Bhattacharjee S. Text segmentation using Gabor filters for automatic document processing[J]. Machine Vision and Applications, 1992,5(3) : 169-184.
7Jain A K, Zhong Y. Page segmentation using texture analysis[J]. Pattern Recognition, 1996,29(5) :743-770.
8Deng S, Latifi S, Regentova E. Document segmentation using polynomial spline wavelet [J]. Pattern Recognition, 2001,34(12) :2533-2545.

共引文献3

1陈波,王加俊,吴陈.基于页面前景和最小二乘法的倾斜校正[J].计算机工程,2007,33(15):202-204. 被引量：6
2刘仁金,高远飙,郝祥根.文本图像页面分割算法研究[J].中国科学技术大学学报,2010,40(5):500-504. 被引量：6
3于明,郭佥,王栋壮,于洋.改进的基于连通域的版面分割方法[J].计算机工程与应用,2013,49(17):195-198. 被引量：8

同被引文献5

1王加俊,黄贤武,郭玮玮,仲兴荣.文本页面图像的图文分割与分类算法[J].中国图象图形学报（A辑）,2004,9(5):571-577. 被引量：4
2Baird H S.The Skew Angle of Printed Documents[C]//Proc.of the 40th Symp.on Hybrid Imaging Systems,Rochester,New York,USA.1987:21-24.
3Yan H.Skew Correction of Document Images Using Interline Cross-correlation[J].Computer Vision,Graphics,Image Processing:Graphical Models and Image Procesing,1993,55(6):538-543.
4Otsu N.A Threshold Selection Method from Gray-level Histogram[J].IEEE Transactions on Systems,Man and Cybernetics,1979,9(1):62-66.
5Mitchell P E,Yan Hong.Document Page Segmentation Based on Pattern Spread Analysis[J].Optical Engineering,2000,39(3):723.

引证文献1

1陈波,王加俊,吴陈.基于页面前景和最小二乘法的倾斜校正[J].计算机工程,2007,33(15):202-204. 被引量：6

二级引证文献6

1靳从,魏之来,杨静宇.基于版式边缘增强算法的倾斜检测[J].南京理工大学学报,2008,32(1):83-85.
2郭延齐,李庆利,蔡诗鸣,胡越,李红波,王蔚生.空间太阳能电池盖板图像获取与处理研究[J].光学仪器,2010,32(1):49-53. 被引量：1
3王辉,牟宏鑫,王嘉梅,梁志茂.一种文本图像倾斜校正的方法[J].云南民族大学学报（自然科学版）,2010,19(3):232-234. 被引量：5
4王江晴,张祎轶.基于像素的少数民族手写体文档倾斜校正算法[J].中南民族大学学报（自然科学版）,2011,30(1):88-91. 被引量：3
5唐群群,哈力木拉提.买买提,艾尔肯.赛甫丁.维吾尔文扫描页的倾斜校正[J].计算机应用研究,2013,30(5):1551-1553. 被引量：6
6邱炜,陈斌.复杂背景下的号码定位与分割[J].计算机应用,2010,30(A12):3325-3326. 被引量：5

1黄运开.长直组合矩形柱面电容传感器电容的研究[J].大学物理,2002,21(10):15-17.
2杜娟,朱华邦.浅谈武器装备的软件维护[J].湖北航天科技,2005(4):25-28.
3王爱贤,卜昆,张琦.最佳吻合矩形件排样算法[J].计算机工程与应用,2006,42(28):60-63. 被引量：1
4麦绿波.通用化的理论和设计方法(上)[J].中国标准化,2014(1):49-53. 被引量：6
5麦绿波.通用化的理论和设计方法(下)[J].中国标准化,2014(2):38-42. 被引量：4
6张莹,第五鑫.善用媒体处置企业网络舆情的探索[J].中国电业,2016,0(7):72-73.
7张白地.高中文言文教学微探[J].软件（教育现代化）（电子版）,2015,5(10):198-198.
8聂敏.关于计算机基础课程的任务驱动的教学设计与实践[J].才智,2015(14):74-75. 被引量：1
9刘红涛,杨保和,吕杭炳,许晓欣,罗庆,王国明,张美芸,龙世兵,刘琦,刘明.Effect of Pulse and dc Formation on the Performance of One-Transistor and One-Resistor Resistance Random Access Memory Devices[J].Chinese Physics Letters,2015,32(2):157-159.
10高元娣.高中文言文教学刍议[J].软件（教育现代化）（电子版）,2014,4(24):211-211.

数据采集与处理

2004年第4期

浏览历史

内容加载中请稍等...

表格文本图像中文字的提取算法被引量：1

参考文献9

二级参考文献8

共引文献3

同被引文献5

引证文献1

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

表格文本图像中文字的提取算法 被引量：1

参考文献9

二级参考文献8

共引文献3

同被引文献5

引证文献1

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

表格文本图像中文字的提取算法被引量：1