摘要
词语之间相似度的计算广泛应用于信息检索、文本主题抽取、文本分类、机器翻译等研究领域.词语之间的相似度的计算通常有两方法,基于统计的方法和基于世界知识的方法.对于中文的词语相似度计算,有人提出一种利用《知网》计算词语相似度的方法,该方法通过计算《知网》义原的相似度进而计算词语的相似度,但是该方法在计算义原相似度时没有考虑义原在层次体系树上的深度以及区域密度.在此基础之上深入研究《知网》的义原层次体系,将义原在层次体系树上的深度和区域密度两个因素添加到义原相似度计算中.最后,实现了该计算方法并得到实验结果,将实验结果与改进前的计算方法的结果比较,发现考虑义原在层次体系树上的深度和区域密度得到的结果比不考虑这两个因素得到结果更符合实际.
The similarity computation between words is widely used in many research area,such as information retrieval,extracting subject of documents,text clutering,machine translation and so on.There used to be two ways to compute the similarity between words,one is based on statistics,another is base on the ontology.There is a method based HowNet to calculate the similarity between Chinese words already.This method calculate the similarity between words thought calculate the similarity between primitives of HowNet.But this method have ignored the depth and density of primitives.We add the factor of primitive depth and density to the method above though researching of HowNet carefully.We realize our method and got the experimental data,and we find our method is more practical than the method already existent.
出处
《辽宁大学学报(自然科学版)》
CAS
2011年第4期358-361,共4页
Journal of Liaoning University:Natural Sciences Edition
关键词
知网
义原
相似度
自然语言处理
HowNet
Primitive
Similarity
Natural Language Processing