期刊文献+

中文科技文档中的数学表达式定位 被引量:4

Extraction of Mathematical Expressions in Printed Chinese Technical Documents
下载PDF
导出
摘要 数学表达式定位是印刷体数学表达式识别的前提。针对中文科技文档,分别对独立表达式和内嵌表达式的定位问题提出了新的方法。采用自适应神经模糊推理系统(ANFIS)对行特征进行分类,提取出独立表达式;采用模糊聚类和动态规划方法,从文档中依次提取出汉字、中文标点和英文字符,利用启发式规则合并剩余的数学符号而提取出内嵌表达式。实验表明,提出的表达式定位方法有很高的正确率。 Extraction of mathematical expressions is the first step of mathematical expressions recognition. A new approach for separating both isolated and embedded expressions in printed Chinese technical documents is presented. After the features of text lines are extracted, ANFIS is used to classify the text lines into two classes: lines of text and lines of isolated expressions. For embedded expressions, Fuzzy clustering and dynamic programming algorithm are applied to extract Chinese Characters, Chinese punctuations and English letters in sequence. At last, heuristic rules are used to merge mathematics into expressions. The methods proposed are proved to have high accuracy by experiments.
出处 《中文信息学报》 CSCD 北大核心 2007年第4期86-91,共6页 Journal of Chinese Information Processing
关键词 人工智能 模式识别 数学表达式定位 自适应神经模糊推理系统 模糊聚类 中英文分离 artificial intelligence pattern recognition mathematical expressions extraction ANFIS fuzzy clustering Chinese/English separation
  • 相关文献

参考文献9

  • 1H.J.Lee,J.S.Wang.Design of a mathematical expression recognition system[A].In:Proceedings of 3rd International Conference on Document analysis and Recognition[C].ICDAR'95,Montréal,Canada,1995.464-468.
  • 2Richard J.Fateman.How to Find Mathematics on a Scanned Page[R].Technical Report,1996.
  • 3K.Inoue,R.Miyazaki,M.Suzuki.Optical Recognition of Printed Mathematical Documents[A].In:Proceedings of the Third Asian Technology Conference in Mathematics[C].Springer-Verlag,1998.280-289.
  • 4A.Kacem,A.Belaid,M.Ben Ahmed.EXTRAFOR:automatic EXTRAction of mathematical FORmulas[A].In:Proceedings of 5th International Conference on Document analysis and Recognition[C].ICDAR'99,Bangalore,India,1999.527-530.
  • 5S.P.Chowdhury,S.Mandal,A.K.Das and B.Chanda.Automated Segmentation of Math-Zones from Document Images[A].In:Proceedings of 7th International Conference on Document analysis and Recognition[C].ICDAR'03,Edinburgh,Scotland,2003.755-759.
  • 6Utpal Garain,B.B.Chaudhuri,A.Ray Chaudhuri.Identification of Embedded Mathematical Expressions in Scanned Documents[A].In:Proceedings of 17th International Conference on Pattern Recognition[C].ICPR'04,Cambridge,United Kingdom,2004.Volume 1:384-387.
  • 7Jyh-Shins Roger Jang.ANFIS:Adaptive-Network-Based Fuzzy Inference System[J].IEEE Transaction on Systems,Man and Cybernetics.1993,23(3).
  • 8边肇祺 张学工.模式识别[M].北京:清华大学出版社,1999.282-283.
  • 9B.B.Chaudhuri,Utpal Garain.Automatic detection of italic,bold and all-capital words in document images[A].In:Proceedings of 14th International Conference on Pattern Recognition[C].ICPR'98,Brisbane,Australia,1998.Volume 1:610-612.

共引文献142

同被引文献29

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部