期刊文献+

中文电子文档中数学公式的语义识别方法研究

Research on Semantic Recognition for Mathematical Formula in Digital Chinese Documents
下载PDF
导出
摘要 中文电子文档中数学公式结构复杂且含有大量特殊符号,针对目前OCR技术难以高效识别数学公式,提出了一种新的公式语义识别方法.首先结合字符宽度中心矩和汉字拒识法对公式进行两次定位,然后利用投影法和连通域法切分公式字符,提取字符孔洞数、穿越线等特征构建字符模板库,利用模板匹配方法识别公式中各字符,接着基于五类特征字符的特点,建立后标型、包含型和独立型等七种字符块合并规则以分析公式结构、还原公式的语法含义,最后将公式结构分析结果以EQ域语法串的形式输出.实验结果表明,本文方法可以有效地对中文电子文档中的数学公式进行语义分析. As a result of the complex structure and large numbers of special symbols in mathematical formulas,it's difficult for OCR technology to recognize formulas efficiently from digital Chinese documents at present. In view of this, a novel formula semantic recognition approach is proposed. Firstly, the paper locates formulas twice using both central moment of character width and rejection of Chinese character method. Then, the projection method and connected domain are put forward on character segmentation, and hole number, traversing line and some other features are extracted from characters to create character template library, characters in the formula are recognized by template matching. Next,in order to analyze the structure and grammatical meaning of the formula,seven combination rules are established based on the characteristics of five kinds of characters. Finally, structural analysis results output in the form of EQ domain syntax string. Experimental results show that the proposed method can realize semantic analysis for mathematical formulas in digital Chinese document effectively.
出处 《小型微型计算机系统》 CSCD 北大核心 2017年第10期2379-2384,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(51574004)资助
关键词 数学公式定位 字符识别 结构分析 语义识别 mathematical formula location character recognition structural analysis semantic recognition
  • 相关文献

参考文献3

二级参考文献42

  • 1王科俊,王黎斌,林桂芳.科技文献中数学公式定位技术概述[J].自动化技术与应用,2004,23(5):1-4. 被引量:3
  • 2靳简明,江红英,王庆人.数学公式图像处理综述[J].模式识别与人工智能,2005,18(4):429-440. 被引量:7
  • 3靳简明,江红英,王庆人.数学公式识别系统:MatheReader[J].计算机学报,2006,29(11):2018-2026. 被引量:13
  • 4Chaudhuri B B,Garain U.An approach for recognition and interpretation of mathematical expressions in printed docum-ent[J].Pattern Analysis & Applications 2000,3:120-131.
  • 5Garain U,Chaudhuri B B,Chaudhuri A R.Identification of embedded mathematical expressions in scanned documents[C]//The 17th International Conference on Pattern Recognition.Washington DC: IEEE Computer Society,2004,1:384-387.
  • 6Tian Xue-Dong,Li Hai-Yan,L Xin-Fu.Research on symbol recognition for mathematical expressions[C]//The 1st International Conference on Innovative Computing,lnformation and Control.Washington DC:IEEE Computer Society,2006,3:357-360.
  • 7Basu S,Chaudhuri C.Text line extraction from multi-skewed handwritten documents[J].Pattem Recognition Society,2007:1825-1839.
  • 8Jin J,Han X,Wang Q.Mathematical formulas extr-action[C]//Proc of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 2003 : 1138-1141.
  • 9Theodoridis S.Pattern recognition[M].4th ed.American:Academic Press, 2009.
  • 10Leandro N C,Zuben F J V.Learning and optimization using the clone selection principle[J].IEEE Transactions on Evolutionary Computation, 2002,6 ( 3 ) : 239-251.

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部