摘要
词语语义相似度的计算,一种比较常用的方法是使用分类体系的语义词典(如Wordnet)。本文首先利用Hownet中“义原”的树状层次结构,得到“义原”的相似度,再通过“义原”的相似度得到词语(“概念”)的相似度。本文通过引入事物信息量的思想,提出了自己的观点:认为知网中的“义原”对“概念”描述的作用大小取决于其本身所含的语义信息量;“义原”对“概念”的描述划分为直接描述和间接描述两类,并据此计算中文词语语义相似度,在一定程度上得到了和人的直观更加符合的结果。
A basic approach for measuring semantic similarity/distance between words and concepts is to use lexical taxonomy, such as Wordnet. Hownet is a Chinese semantic dictionary, containing abundant semantic information and ontology knowledge, but has quite different construction and architecture. In this paper, we present a new approach using Hownet by drawing in the idea of information theory. We propose that the more semantic information a "sememe" take, the more powerful it in describing concepts. Then we divide "sememe" which describes a concept into two set: directly describing part and indirectly describing part. In the experiments, we demonstrate our method have improved performance in measuring semantic similarity between Chinese words.
出处
《中文信息学报》
CSCD
北大核心
2007年第3期99-105,共7页
Journal of Chinese Information Processing
关键词
计算机应用
中文信息处理
词语语义相似度
知网
“义原”
语义信息量
computer application
Chinese information processing
semantic similarity
Hownet
"sememe"
semantic information