摘要
目前的办公文档通常都是基于XML格式的,其树型存储结构中包括逻辑内容、格式描述、页面版式描述以及编辑元素描述,它们之间既相互分离又相互融合,给文档的处理带来复杂性。论文分析了办公文档的结构特征,提出了在两种典型应用处理场景中基于本体的文档操作方法。本体的引入可以使办公文档的处理能够根据不同的应用环境,通过机器推理机制实现文档处理的智能化,同时有利于实现文档处理的互操作;在处理过程中节点的定位相对于XPath更高效,并能够满足在特定应用中,文档的处理不破坏文档的基本结构需求。本文以中文办公软件格式标准UOF为基础建立基于本体的文档结构模型,并利用SWRL推理规则,实现办公文档的智能化处理。
Currently,office document formats are usually based on XML.It includes some logic content nodes,format style nodes,page layout describing nodes and some editing element nodes in its tree storage structure.It raises some issues for processing.Paper analyses the characteristics of document structure,and two methods of document processing under different typical application scenarios which based on ontology are presented.As ontology technology is introduced into,office document processing can be reasoned by machine according to various environments and to be executed automatically,at the same time it brings some benefits for interoperability,and in the procedure of processing,positioning the nodes will get more efficiency than XPath without destroying the document structure in special applications.In the end,the paper shows how to build office document ontology model based on UOF format and describes simple SWRL rules for intelligent processing for office document.
出处
《北京信息科技大学学报(自然科学版)》
2010年第S2期97-102,142,共7页
Journal of Beijing Information Science and Technology University
基金
北京市教委科技发展重点项目暨北京市自然科学基金(KZ200810772017)
北京市属市管高等学校人才强教计划资助项目(PHR201007131)
关键词
办公文档
本体
智能操作
机器理解
UOF
office document
ontology
intelligent operation
machine understanding
UOF