期刊文献+

Postscript格式科技文献中数学表达式的提取方法 被引量:4

EXTRACTING MATHEMATICAL EXPRESSIONS FROM SCIENTIFIC AND TECHNICAL DOCUMENTS IN POSTSCRIPT FORMAT
下载PDF
导出
摘要 从Postscript格式的科技文献中提取识别数学表达式,是数学表达式识别领域的一个新的研究方向。主要针对以Word和Latex为生成源的PS文档,提出了基于内容的数学表达式提取方法。首先重载了PS语言中的一些相关命令,以提取PS文档中的字符与线段信息;之后根据字符名称、字体、位置等信息对字符进行分析,同时连接线段并加以识别,从而提取出数学符号;最后,根据符号问的空间位置关系和启发式规则,将数学符号归并,提取出最终的表达式。实验结果表明该方法正确率达到98.56%。 A content-based approach to mathematical expressions extraction from Postscript documents is presented. The current study objects are Postscript documents generated by Microsoft Word or Latex. Firstly, some relevant orders in PS language are redefined to extract character and line information. Then, the name, font type and location of characters are analyzed. The connected lines are recognized, and the mathematical characters are extracted. Finally, heuristic rules are used to merge mathematics into expressions. The method proposed is proved to have high accuracy by experiments.
出处 《计算机应用与软件》 CSCD 北大核心 2008年第11期157-159,162,共4页 Computer Applications and Software
关键词 数学表达式提取 POSTSCRIPT 文本抽取 Mathematical expressions extraction Postscript Text extraction
  • 相关文献

参考文献7

  • 1靳简明,江红英,王庆人.数学公式图像处理综述[J].模式识别与人工智能,2005,18(4):429-440. 被引量:7
  • 2InftyProjeet. Infty reader version 2.5 [ EB/OL]. http://www, inftyproject, org/.
  • 3Yang M, Fateman R. Extracting mathematical expressions from postscript documents [ C ]. Proceedings of the International Symposium on Symbolic and Algebraic Computation. Santander, 2004 : 305 - 311.
  • 4Adobe Systems Incorporated. Postscript Language Reference, Third Edition [ M]. Massachusetts: Addison-Wesley Publishing Company, 1999.
  • 5Digital Equipment Corporation. Pstotext [ EB/OL]. http://www, reserach, compaq, com/SRC/virtualpaper/pstotext, html.
  • 6杨捧,田学东.基于Parzen窗的印刷文档数学公式抽取的研究[J].计算机工程与应用,2005,41(23):200-202. 被引量:4
  • 7张志伟,孔凡让,刘维来,龙潜,刘永斌.中文科技文档中的数学表达式定位[J].中文信息学报,2007,21(4):86-91. 被引量:4

二级参考文献64

  • 1R Fateman,T Tokuyasu,B Berman.Optical Character Recognition and Parsing of Typeset Mathematics[J].Joumal of Visual Communication and Image Representation, 1996;7(1 ) :2-15.
  • 2A KACEM,A BELAID,M Ben AHMED.Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context[J]. IJDAR(4) ,2001 ; (2) :97-108.
  • 3RichardODuda PeterEHart DavidGStork.模式分类[M].北京:机械工业出版社,2003.134-174.
  • 4Hsi-Jian Lee,Jiumn-Shine Wang.Design of a Mathematical Expression Recognition System[C].In :Proceedings of ICDAR'95,Canada, 1995 : 1084- 1087.
  • 5J-Y Toumit,S Garcia-Salicetti,H Emptoz.A Hierachical and RECURSIVE Model of Mathematical Expressions for Automatic Reading of Mathematical Documents[C].In :Proceedings of ICDAR '99 ,India, 1999 : 116-122.
  • 6边肇祺 张学工.模式识别[M].北京:清华大学出版社,1999.282-283.
  • 7Anderson R H. Syntax-Directed Recognition of Hand-Printed Two-Dimensional Mathematics. In: Klerer M, Reinfelds J, eds. Interactive Systems for Experimental Applied Mathematics. New York, USA: Academic Press, 1968, 436-459.
  • 8Nouzumi S, Inoue K, Miyazaki R, Suzuki M. Optical Recognitlon System of Printed Japanese Mathematical Documents. In:Proc of the 3rd IAPR Workshop on Document Analysis Systems. Nagano, Japan, 1998, 197-200.
  • 9Anderson R H. Syntax-Directed Recognition of Hand-Printed Two-Dimensional Mathematics. Ph. D Dissertation. Harvard University, Cambridge. USA, 1968.
  • 10Chang S K. A Method for the Structural Analysis of Two-Dimensional Mathematical Expressions. Information Sciences,1970, 2(3): 253-272.

共引文献10

同被引文献22

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部