摘要
国内利用知网计算中文词语相似度通常采用基于义原距离和深度的方法,计算结果依赖于公式的设计和参数的选取。针对词语相似度的计算,文章提出采用知网义原信息量来计算中文词语相似度,根据信息论中计算两个事物相似度的思想,利用知网的分类体系来计算义原信息量,从词语概念的主类义原信息量、义原及其角色关系的信息量及义原结点相似度三个方面来综合计算词语的相似度,与刘群和知网在线的方法进行比较,实验结果显示本文方法与人的判断更为接近。
Chinese words similarity computation based on Hownet commonly used sememe distance and depth, the similarity measure is defined directly by a formula and rely on the selection of parameters. This paper presented a new method of Chinese words semantic similarity computation. The method is based on new HowNet with its lexical taxonomy to calculate the information content of HowNet sememe, and combines the idea of the similarity computing between two objects of information theory to compute word similarity from three dimensions: the information content of main sememe, the information content of sememe and Event Role, the semerne node similarity. The experimental results demonstrate that the method is reasonable and effective.
出处
《电脑与信息技术》
2015年第3期21-24,63,共5页
Computer and Information Technology
关键词
义原信息量
概念相似度
结点相似度
词语相似度
information content of sememe
concept similarity
node similarity
word similarity