期刊文献+

一种中文文档的数学公式定位方法

An Identification Method for Mathematical Expressions in Scanned Chinese Document
下载PDF
导出
摘要 为了从中英文混排的中文文档中定位数学公式,提出了一种基于中文字符识别和公式符号识别的数学公式定位方法。该方法主要由中文字符提取、内嵌公式提取和独立公式定位三个部分组成。在中文字符提取中,首先提取字符块信息:中文字符识别结果、公式符号识别结果和字符块的几何特征,然后使用决策树的方法区分中文字符和非中文字符。在内嵌公式提取中,使用公式符号的语义信息、符号间的角标关系和公式的语义信息等从非中文字符中定位内嵌公式。在独立数学公式定位中,对包含较多内嵌公式符号且不包含中文字符的文字行提取版式结构特征,并使用高斯混合模型区分独立公式和普通文字行。在148幅文档图像共包含3 690个公式组成的测试集上取得了91.19%的公式定位正确率。 In order to extract mathematical expressions (MEs) in scanned Chinese document, a ME identification method based on Chinese character recognition and ME symbol recognition is proposed. In this paper, Chinese blocks are firstly deleted based on a decision tree using the features from Chinese character recognition result, ME symbol recognition result and character's geometric information. Then the embedded MEs are extracted from non- Chinese character blocks based on semantic information of ME, syntax information and script relation between adjacent blocks. Finally, the isolated MEs without Chinese blocks are identified for embedded ME symbols by Gaussian Mixture Model. The experiments were carried on a dataset with 148 document images containing 3690 MEs, and the results show that the proposed method reaches 91.19 % in the ME identification accuracy.
出处 《中文信息学报》 CSCD 北大核心 2008年第4期83-87,共5页 Journal of Chinese Information Processing
基金 国家863高技术研究发展计划资助项目(2006AA01Z153)
关键词 人工智能 模式识别 中文文档 字符识别 数学公式 高斯混合模型 artificial intelligence pattern recognition Chinese document character recognition mathematical expression Gaussian mixture model
  • 相关文献

参考文献12

  • 1K.-F. Chan and D.-Y. Yeung. Mathematical expression recognition:a survey [J]. Int. J. Doc. Anal. Recog, 2000, 3(1):3-15.
  • 2Richard J. Fateman and Taku Tokuyasu. Progress in recognizing typeset mathematics [C]//Proceedings of the SPIE. San Jose, CA: 1996, 2660, 37-50.
  • 3Hsi-Jian Lee and Jiumn-Shine. Wang. Design of a Mathematical Expression Recognition System[C]//ICDAR'95. Montr al, Canada:1995,1084-1087.
  • 4Hsi-Jian Lee and Jiumn-Shine. Wang. Design of a Mathematical Expression Recognition System [J]. Pattern Recognition Letters, 1995, 18:289-298.
  • 5Richard J. Fateman. How to Find Mathematics on a Scanned Page[C]//Proc. SPIE. 1999,3967 : 98-109.
  • 6J-.Y. Toumit, S. Garcia-Salicetti,H. Emptoz. A Hierarchical and Recursive Model of Mathematical Expressions for Automatic Reading of Mathematical Documents[C]//ICDAR' 99, Bangalore, India: 1999: 119-122.
  • 7Jianming Jin, Xionghu Han, Qingren Wang. Mathematical Formulas Extraction[C]//Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), 1138-1141.
  • 8田学东,杨捧,张立平,苗秀芬.印刷文档中数学公式抽取的研究[J].河北大学学报(自然科学版),2005,25(5):545-548. 被引量:1
  • 9B.B. Chaudhuri and U. Garain. An Approach for Recognition and Interpretation of Mathematical Expressions inPrinted Document [C]//Pattern Analysis and Applications(PAA), 2000, 3: 120-131.
  • 10Utpal Garain, B. B. Chaudhuri, A. Ray Chaudhuri. Identification of Embedded Mathematical Expressions in Scanned Documents [C]//ICPR'04, 2004, 1 : 384- 387.

二级参考文献15

  • 1HSK-JIAN LEE, JIUMN-SHINE WANG. Design of a Mathematical Expression Recognition System[Z]. In Proceedngs of the Third International Confenena on Document Andysts and Recognition.Canada, 1995:1084-1087.
  • 2TOUMIT J-Y,GARCIA-SALICETTI S,EMPTOZ H. A Hiearachical and RECURSIVE Model of Mathematical Expressions for Automatic Reading of Mathematical Documents[Z]. In Proceedings of ICDAR'99, India, 1999.
  • 3FATEMAN R,TOKUYASU T, BERMAN B.Optical character recognition and parsing of typeset mathematics[J]. Journal of Visual Communication and Image Representation,1996, 7(1): 2-15.
  • 4KACEM A,BELAID A,AHMED M BEN. Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context[J].Internatrond Jurnal of Documant Analysrs and Recogrrltron,2001(4): 97-108.
  • 5RICHARDODUDA PETEREHART.DAVIDGSTORK,模式分类[M].北京:机械工业出版社,2003.134-174.
  • 6边肇祺 张学工.模式识别[M].北京:清华大学出版社,1999.282-283.
  • 7H.J.Lee,J.S.Wang.Design of a mathematical expression recognition system[A].In:Proceedings of 3rd International Conference on Document analysis and Recognition[C].ICDAR'95,Montréal,Canada,1995.464-468.
  • 8Richard J.Fateman.How to Find Mathematics on a Scanned Page[R].Technical Report,1996.
  • 9K.Inoue,R.Miyazaki,M.Suzuki.Optical Recognition of Printed Mathematical Documents[A].In:Proceedings of the Third Asian Technology Conference in Mathematics[C].Springer-Verlag,1998.280-289.
  • 10A.Kacem,A.Belaid,M.Ben Ahmed.EXTRAFOR:automatic EXTRAction of mathematical FORmulas[A].In:Proceedings of 5th International Conference on Document analysis and Recognition[C].ICDAR'99,Bangalore,India,1999.527-530.

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部