期刊文献+

计算文本相似度的方法体系与应用分析 被引量:14

Methodological System and Application Scenarios on Text Similarity Calculation
下载PDF
导出
摘要 [目的/意义]文本间的相似度是信息检索、文档检测和文本挖掘等任务核心参考的指标之一。梳理现有计算文本相似度的方法、分类体系及应用,有助于研究人员选择合适的计算方法提高特定场景应用的性能。[方法/过程]文章将算法利用文本语义信息的程度、基础语义信息类型、模型类型以及关联关系类型作为划分依据构建方法体系,并从原理和应用上梳理算法间的异同。[结果/结论]将文本相似度计算方法分为无语义信息、基于浅层语义信息、基于深层语义信息三个大类,对参考的语义信息、算法的基本原理和该类的典型应用做了探索分析。[创新/价值]使文本相似度计算方法具有更清晰和完整的体系,使研究人员能更好地区分相似度计算方法间的计算需求与应用场景的差异。 [Purpose/significance] Text similarity calculation is a core technology in information retrieval,document detection and text mining.Researchers can improve the performance in their applications by using an appropriate calculating text similarity technologies according to the review of methodologies,classification system,and application scenarios.[Method/process] In the paper,the methodological system is constructed based on four characteristics of an algorithm,which are the degree of text semantics,and the type of semantic information,its mathematical model,and relationship.The similarities and differences among methods are concluded from their principle and application.[Result/conclusion] We divide the methods of text similarity calculation into three categories,no semantic information,shallow semantic information and deep semantic information.And the semantic information,the algorithmic principle,and the application scenarios in each categories are also explored and analyzed.[Originality/value] As a result,the methodological system of text similarity calculation is more clear and complete,and it makes researchers to better distinguish and apply an algorithm to their application scenario based on the computational requirements.
作者 黄文彬 车尚锟 Huang Wenbin
出处 《情报理论与实践》 CSSCI 北大核心 2019年第11期128-134,共7页 Information Studies:Theory & Application
关键词 文本挖掘 文本相似度 分类体系 语义信息 应用 text mining text similarity classification system semantic information application
  • 相关文献

参考文献3

二级参考文献34

共引文献105

同被引文献166

引证文献14

二级引证文献95

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部