摘要
XML文档中的标记是表达和控制文档内容的重要组成部分,但用户自行定义的标记往往存在语义歧义问题,对这些标记进行语义消歧是计算XML文档间语义相似度的前提,也是XML文档自动聚类和自动分类的基础。和传统的词典不同,WordNet中的词汇以树状层次结构排列,与XML文档解析后得到的标记树结构类似,为语义消歧提供了良好的、方便应用的工具。本文在阐述目前已有语义消歧算法的基础上,分析了基于WordNet的XML文档标记语义消歧的可行性,并详细说明了具体的流程。从实验结果可以看出,该方法能达到较高的语义消歧准确度。
The tags are important to represent and control the content of XML documents, but it is com- mon that there is semantic ambiguity in user-defined tags. Word Sense Disambiguation is useful to calcu- late the semantic similarity of XML documents, and it' s also the foundation of XML document clustering and classification. Differ from traditional dictionaries, WordNet arranges the words with hierarchical struc- ture like a tree and provides advantage to Word Sense Disambiguation~ The paper introduces the existing algorithms of Word Sense Disambiguation, then analyzes the possibility of word sense disambiguation of XML documents tags based on WordNet, and explains the procedures in detail. The experimental result proves that this method has a high accuracy rate in Word Sense Disambiguation.
出处
《情报科学》
CSSCI
北大核心
2014年第3期116-120,共5页
Information Science
基金
国家自然科学基金项目(70803046)