期刊文献+

基于知识的多页文档逻辑结构的分析和理解

A KNOWLEDGE - BASED APPROACH TO LOGICAL STRUCTURE ANALYSIS AND UNDERSTANDING FOR MULTI - PAGE DOCUMENTS
下载PDF
导出
摘要 文档图像理解中最重要的部分是逻辑结构的提取。目前的研究主要集中在页面的布局分析上,少数对文档逻辑结构的研究只是针对单页文档或页面关系简单的多页文档。建筑标书的特殊性在于其层次式的逻辑组成结构没有明确的索引信息标识。本文提出了一种利用页面间引用关系获取文档逻辑结构的方法。该方法采用修正的树形结构表示文档的逻辑结构,逻辑树的创建过程就是逻辑结构的获取过程,而且有利于更高层的语义处理及还原输出。该方法已在标书自动处理系统中实现,保证了该系统的灵活和高效。 The most important part of document image understanding technology is to extract logical structure of the document. Currently,the main research is focused on kyout analysis, and only less work is aimed at single - page documents or multi - page documents with simple logical structure. The noticeable characteristic of construction tender document is that the hierarchical architecture is not obviously expressed but implied in citing information. In this paper, a new document logical structure extracting method which makes use of the citing information is presented. The hierarchy of tender documents itself leads to extracting their logical structures and dispkying them as modified tree structure. The creation of logical tree corresponds to the procedure of logical structure extracting. Such data structure is useful for higher level semantic processing and reconstruction.This method which ensures efficiency and flexibility of the whole system has been successfully implemented in VHTendei-a tender automatically processing system.
出处 《计算机应用与软件》 CSCD 北大核心 2002年第4期33-37,共5页 Computer Applications and Software
关键词 文档理解 文档处理 物理结构 多页文档逻辑结构 知识库 办公自动化 Document understanding Document processing Layout analysis Physical structure Logical structure
  • 相关文献

参考文献8

  • 1[1]A.Simon,J.Pret and A.P.Johnson,A Fast Algorithm for Bottom- Up Document Layout Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19,No.3, 1997:pp.273 ~ 282.
  • 2[2]Y.Y. Tang, H. Ma, D. Xi,X. Mao and C. Y. Suen, Modified Fractal Signature(MFS):A New Approach to Document Analysis for Automtaic Knowledge Acquisition. IEEE Transactions on Knowledge and Data Engineering,Vol.9,No.5,1997:pp.747 ~ 762.
  • 3[3]S. Baumann,M.B.H. Ali,A. Dengel etc.,Message Extraction from Printed Doouments. In Proceedings of the Fourth International Conference on Donument Analysis and Recognition, Ulm, Germany, August, 18 - 20, 1997: pp.1055 ~ 1059.
  • 4[4]D.Niyogi,A Knowledge- Based Approach to Deriving Logical Structure from Document Images. Dissertation, State University of New York at Buffalo, August,1994.
  • 5[5]Y. Y Tang, H.Ma etc., Multiresolution Analysis in Extraction of Reference Lines from Documents with Gray Level Background. IEEE Transactions on Pattem Analysis and Machine Intelligence, Vol. 19, No. 8,1997: pp. 921 ~926.
  • 6[6]Y.Y.Tang and J. Liu, Information Acquistion and Storage of Froms in Document Processing. In Proceedings of the Fourth Intemational Conference on Document Analysis and R eongnition, Um, Germany, August. 18 - 20,1997:pp. 170 ~ 174.
  • 7[7]C.C.Lin, Y. Niwa and S. Narita, Logical Structure Analysis of Book Document Images Using Contents Information. In Proceedings of the Fourth International Conference on Document Analysis and Recognition, Ulm, Germany,August. 18 - 20.1997: pp. 1048 ~ 1054.
  • 8[8]S.H .Wang,Z.Li,R. Y.Yang, S.J.Cai,A Document Image Understanding System for Teller: In Proceedings of International Symposiumon Future Software Technolog. Nanjing, China, Oct.27 - 29,1999:pp.360 ~ 362.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部