摘要
【目的】提出一种基于维基百科的多策略词义消歧方法,充分利用维基百科中的潜在知识进行消歧。【方法】设计类别一致性、内容相关性以及词义重要程度三个指标,并通过动态熵权线性融合各指标值以及二次消歧的方法来确定歧义词在特定语境的最佳词义。【结果】通过实验,该方法取得了74.82%的准确率,可以验证其有效性。【局限】候选词义粒度较细,且主要针对英文进行消歧,对其他语言缺少一定的普适性。【结论】维基百科为消歧提供更多的语义知识和背景信息,能够提高消歧准确率。
[Objective] This paper proposes a multi-strategy method for Word Sense Disambiguation (WSD) based on Wikipedia which makes full use of the latent knowledge in Wikipedia. [Methods] Design three indicators including category commonness, content relatedness and the importance of the word sense, make an entropy-based dynamic linear fusion of these three indicators, combined with re-disambiguation to choose the best sense of an ambiguous term in its context. [Results] Experimental result shows an average precision of 74.82%, therefore validating the feasibility and effectiveness of this method. [Limitations] The proposed method mainly aims at WSD in English with a setting of fine grained candidate senses, lacking certain generality to other languages. [Conclusions] This method provides more semantic knowledge and background information based on Wikipedia which enhance the precision of disambiguation tasks.
出处
《现代图书情报技术》
CSSCI
2015年第11期18-25,共8页
New Technology of Library and Information Service
基金
北京市自然科学基金预探索项目"发明过程和机理的概念地图表示研究"(项目编号:9153020)
2015年度北京市教委社会科学计划面上项目"一种基于概念地图的发明过程机理的描述方法"(项目编号:SM201510005001)的研究成果之一
关键词
词义消歧
维基百科
相关度
熵权
二次消歧
Word sense disambiguation Wikipedia Relatedness Entropy coefficient Re-disambiguation