摘要
在计算涉及地名的句子相似度时,地名有着特别的重要性.如果不恰当地对地名进行处理,不体现出地名对于句子的重要性以及地名间的差异性,会导致相似度计算结果不甚合理.提出了一种改进的句子相似度计算方法.该方法在计算地名词语相似度时利用了地名在地理树中的层次关系以及从百度地图API获得的经纬度坐标,在计算非地名词语相似度时则利用了HowNet知识库,通过对地名词语和非地名词语赋予不同的权重来体现地名的重要性,并计算出句子的整体语义相似度,再结合句子结构的相似度计算出句子的综合相似度.实验表明:改进后的新方法对于计算涉及地名的句子相似度可以取得更理想的结果.
Place names are of special importance when the similarity of sentences involving place names is calculated. The calculation results will not be reasonable if an algorithm improperly deal with place names. This paper proposes an improved method of Chinese sentence similarity calculation. This method uses the latitude and longitude coordinates obtained by Baidu map API and the hierarchical relationship in geographical tree when the similarity of place names is calculated. The HowNet is used to calculate the similarity between non-place names words. Then it gives different weights to place name words and non-place name words in order to obtain the semantic similarity of sentences. Finally the similarity of two Chinese sentences is calculated combining the sentences' semantic similarity with structure similarity. Experiments show that this improved method can achieve better results for the calculation of the sentence similarity involving place names.
出处
《浙江工业大学学报》
CAS
北大核心
2015年第6期624-629,共6页
Journal of Zhejiang University of Technology
关键词
句子相似度
地名
地理树
经纬度
sentence similarity
place names
geographical tree
coordinates