摘要
【目的/意义】文本相似度计算是自然语言处理中的一项基础性研究,通过总结和分析文本相似度计算的经典方法和当前最新的研究成果,完善对文本相似度计算方法的系统化研究,以便于快速学习和掌握文本相似度计算方法。【方法/内容】对过去20年的文本相似度计算领域的经典文献进行整理,分析不同计算方法的基本思想、优缺点,总结每种计算方法的侧重点和不同方向上最新的研究进展。【结果/结论】从表面文本相似度计算方法和语义相似度计算方法两方面进行阐述,形成较为全面的分类体系,其中语义相似度计算方法中的基于语料库的方法是该领域最为主要的研究方向。
【Purpose/significance】Text similarity calculation is a basic research in natural language processing. Through summing up and analyzing the classical methods of text similarity calculation and the latest research results, we improve the systematic research on text similarity algorithms, so as to quickly learn and grasp the text similarity calculation methods.【Method/process】We collate the classical literature in the field of text similarity algorithms in the past 20 years, and analyze the basic ideas, advantages and disadvantages of different computing methods, and summarizes the emphasis of each method and the latest research progress in different directions.【Result/conclusion】The surface text similarity calculation method and semantic similarity calculation method were discussed to form a more comprehensive classification system. Corpus-based approach to semantic similarity calculation is the most important research direction in this field.
作者
王春柳
杨永辉
邓霏
赖辉源
WANG Chun-liu;YANG Yong-hui;DENG Fei;LAI Hui-yuan(Institute of Computer Application,China Academy of Engineering Physics,Mianyang 621000,China)
出处
《情报科学》
CSSCI
北大核心
2019年第3期158-168,共11页
Information Science
基金
国防基础科研计划重点项目(JCKY2016212B004)
关键词
文本相似度
语义相似度
语料库
text similarity
semantic similarity
sorpus-based
review