期刊文献+

结合Doc2Vec与改进聚类算法的中文单文档自动摘要方法研究 被引量:18

Automatic Abstracting of Chinese Document with Doc2Vec and Improved Clustering Algorithm
原文传递
导出
摘要 【目的】引入深度神经网络模型Doc2Vec,以综合考察文本的上下文语境信息。结合改进的K-means聚类算法,实现中文单文档摘要的提取。【方法】利用Doc2Vec模型,提取语句的语义、语法、语序等特征,将其转化为固定维度的向量。基于密度最大距离最远原则为K-means聚类算法选取初始聚类中心,对语句向量进行聚类。在每个类簇内计算句子的信息熵,提取类内与其他语句均具有较高相似度的句子作为摘要句。【结果】相对于传统的向量化表示方法 PLSA,利用本文方法生成的摘要效果在准确率、召回率、F值上分别提高了9.57%、7.62%、10.30%。【局限】提取的摘要句来源于正文,而标准摘要是对正文的高度凝练总结,二者通常难以完全匹配。【结论】实验结果表明,相对于常见的向量化表示方法,本文提出的方法能较为显著地提升自动摘要的效果,对多文档自动摘要的实现提供了一种思路。 [Objective] This paper aims to improve the performance of automatic abstracting with the help of "Doc2 vec" model and improved K-means clustering algorithm. [Methods] First, we introduced the Doc2 Vec model, which could examine the document contextual information, to extract the semantics, grammar and word sequences of Chinese document sentences. Then, we transformed these sentences to vectors of fixed dimensions. Third, we identified clustering centers for the improved K-means algorithm, and then processed the sentence vectors. Finally, the sentences with larger information entropy in one cluster, as well as higher similarity with other sentences in the cluster, were extracted. [Results] Compared with the PLSA method, the precision, recall, and F value of the proposed model increased by 9.57%, 7.62% and 10.30% respectively. [Limitations] We could not use the sentences extracted from the documents to generate high quality abstracts. [Conclusions] The proposed method could improve the performance of automatic abstracting of Chinese documents.
作者 贾晓婷 王名扬 曹宇 Jia Xiaoting1 ,Wang Mingyang1, Cao Yu2(1College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China;2Tongfang Knowledge Network, Beijing 100192, Chin)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2018年第2期86-95,共10页 Data Analysis and Knowledge Discovery
基金 中央高校基本科研业务费专项资金项目"基于社会网络特征提取的群体性突发事件预警方法研究"(项目编号:2572014DB05) 国家自然科学基金项目"群体性突发事件预警的超网络方法研究"(项目编号:71473034)的研究成果之一
关键词 自动摘要 Doc2Vec K-MEANS 信息熵 Automatic Abstracting Doc2Vec K-means Clustering Information Entropy
  • 相关文献

参考文献15

二级参考文献132

共引文献357

同被引文献310

引证文献18

二级引证文献91

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部