摘要
相似度的计算在信息检索及文档复制检测等领域具有广泛的应用前景.研究了文本相似度的计算方法,在知网语义相似度的基础上,将基于语义理解的文本相似度计算推广到段落范围,进而可以将这种段落相似度推广到篇章相似度计算.给出了文本(包括词语、句子、段落)相似度的计算公式及算法,用于计算两文本之间的相似度.实例验证表明,该算法与现有典型的相似度计算方法相比,计算准确性得到提高.
Text similarity counting has been widely used in several fields, for example, the field of copy detection and the field of information retrieval, etc.. With the study of text similarity computing and semantic understanding, the textural similarity counting can be expanded to paragraph similarity counting, and then the paragraph similarity counting can be expanded to article similarity counting. A new set of textural (including words, sentences and paragraphs) similarity algorithm is given. This algorithm can count out the similarity rate of two texts. Compared with other methods of similarity computing, the algorithm can raise the recall rate.
出处
《大连理工大学学报》
EI
CAS
CSCD
北大核心
2005年第2期291-297,共7页
Journal of Dalian University of Technology
基金
国家自然科学基金资助项目(60073036
50275019)派生研究.