期刊文献+

一种基于公共词块的英文短文本相似度算法 被引量:7

Similarity Algorithm Based on Common Chunks Between English Short Texts
下载PDF
导出
摘要 短文本相似度计算是自然语言处理方面的研究热点,传统基于词项的文本相似度算法只考虑了词项因素,忽略了词序对短文本相似性的影响。为此提出了一种基于公共词块的短文本相似度计算方法,综合考虑了词项和词序因素,将基于词项重合的重叠相似度算法与公共词块间的词序相似度算法相结合,并采用自适应的加权组合方式得到短文本相似度值。实验结果显示:与传统算法相比,该算法在稳定性和F值方面都具有较好的结果。 It is short text similarity computation that has been the focus of the natural language pro- cessing. Only the words are considered in the traditional text similarity algorithm based on the terms, with words order ignored. A new method based on common chunks was presented to calculate the short text similarity, which considers the number and the sequence of the same words. The similarity of the short texts was gotten through making automatic coefficient between the similarity based on the same words and the similarity based on the order of the same words. The simulation results show that, compared with conventional similarity algorithms, the presented algorithm has a better performance in the stability and the harmonic-mean towards the precision and the recall.
出处 《重庆理工大学学报(自然科学)》 CAS 2015年第8期88-93,共6页 Journal of Chongqing University of Technology:Natural Science
基金 国家自然科学基金资助项目(61173184) 重庆市教委科技计划项目(KJ100821) 重庆理工大学研究生创新基金资助项目(YCX2014227)
关键词 短文本 词序 公共词块 相似度算法 short text words order common chunks similarity algorithm
  • 相关文献

参考文献14

二级参考文献89

共引文献270

同被引文献42

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部