摘要
为了反映词语间的语义相关程度,提出了一种基于向量空间模型的构建语义库的新方法.在构建语义库时,对大量语料文本进行迭代式学习,在学习过程中引入淘汰算法,并综合考虑了诸如共同出现次数、平均出现距离、信息熵以及单字语义信息等多种对词语间语义关系产生影响的因素.实验证明,用该方法得到的相关语义库能够较好地反映现实世界中词语之间的相关程度.
In order to reflect word interrelationship, a new method based on a vector space model was presented to construct a semantic library. The semantic library is trained with large number of texts. This training is iterative, makes use of elimination algorithm, and takes into consideration of many relevant factors that may have effects on semantic relationship between words. The factors considered in the construction include co occurrence, average distance, information entropy, and meanings of Chinese characters. The experiments show that this library is able to reflect the interrelationships among words in the real world.
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2008年第7期1129-1132,共4页
Journal of Shanghai Jiaotong University
基金
日立-交大软件学院数字家电实验室合作研究项目
关键词
语义库
向量空间
语义相关度
信息熵
语料训练
semantic library
vector space
word relativity: information entropy
text training