期刊文献+

SOM-NCSCM+:抽取式神经网络中文标题生成方法研究

SOM-NCSCM+:research on Chinese headline generation method based on extractive neural network
下载PDF
导出
摘要 标题生成作为文本摘要任务的一个分支,能够帮助人们高效获取信息。本文针对中文标题生成任务面临的大规模、高质量中文标注数据缺乏的问题,利用标题往往可由原文中的词语来构成的特点,从将无监督学习模型与有监督的序列标注模型结合的角度出发,提出了融合聚类模型和主题模型的抽取式深度神经网络中文标题生成方法和模型。在缺乏人工分类标注信息的中文新闻数据集上,该模型可利用聚类和主题模型自动挖掘数据内部潜在的特征信息,获得不同的数据簇及各簇内的主题词来辅助中文新闻标题生成,使模型在具有潜在主题类别特征的、标题质量参差的中文新闻数据集上都具有较好的适用性。本文提出的中文标题生成模型在互联网上公开的中文新闻标题数据集上的实验结果也表明其在微观F1、BLEU、ROUGE、压缩率等评价指标上都取得了较基准模型更好的效果。 As a branch of text summarization task,headline generation can help people obtain information efficiently.In this paper,aiming at the lack of large-scale and high-quality Chinese annotation data in the Chinese headline generation task,taking advantage of the feature that headlines can often be formed from words in the contents,a Chinese headline generation method and model based on extractive deep neural network is proposed.The whole model is enhanced with the clustering model and the topic model,from the perspective of combining unsupervised learning model with supervised sequence labeling model.On the Chinese news data lacking manual annotated classifications,the whole model can automatically mine potential feature information within the data,and obtain different data clusters and the topic words to assist Chinese news headline generation by applying the clustering model and topic model,which makes the whole model more adaptable on the Chinese news data of different topics and uneven annotation quality.The experimental results on a dataset of Chinese news headline generation publicly available on the Internet also show that this whole model achieves better performance on the evaluation metrics,including the micro F1,BLEU,ROUGE and compression ratio than the baseline models.
作者 资康莉 王石 曹存根 ZI Kangli;WANG Shi;CAO Cungen(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
出处 《高技术通讯》 CAS 2023年第8期836-848,共13页 Chinese High Technology Letters
基金 国家重点研发计划(2022YFC3302300) 国家242信息安全计划(2022A056)资助项目。
关键词 中文标题生成 神经网络模型 主题模型 聚类模型 序列标注 Chinese headline generation neural network model topic model clustering model sequence labeling
  • 相关文献

参考文献3

二级参考文献2

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部