摘要
汉语词语的语义相似度计算是中文信息处理中的一个关键问题。该文提出了一种基于知网、面向语义、可扩展的相似度计算新方法,该方法从信息论的角度出发,定义了知网义原间的相似度计算公式,通过对未登录词进行概念切分和语义自动生成,解决了未登录词无法参与语义计算的难题,实现了任意词语在语义层面上的相似度计算。针对同义词词林的实验结果表明,该方法的准确率比现有方法高出近15个百分点。
Similarity computation of Chinese words is a key problem in Chinese information processing. This paper proposes a new method on similarity computation which is based on Hownet, geared to semantic and could be expanded. The new method defines a similarity computation formula among Hownet's sememes according to information theory, finds a way out of the difficulty that OOV words cannot participate in semantic computation by implementing concept segmentation and automatic semantic production to OOV words, and realizes the similarity computation on the semantic level among arbitrary words finally. Experimental result of CILIN indicates that the accuracy rate of the new method is nearly 15% higher than present ones.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第6期191-194,共4页
Computer Engineering
基金
中国人民大学科学研究青年基金资助项目
数据工程与知识工程教育部重点实验室(中国人民大学)开放课题基金资助项目
关键词
词语相似度
知网
概念
义原
Words similarity
Hownet
Concept
Sememe