期刊文献+

池化和注意力相结合的新闻文本分类方法 被引量:4

Method of News Text Classification Combining Pooling and Attention Mechanism
下载PDF
导出
摘要 信息时代互联网上产生了海量的文本数据,它们蕴含着巨大的商业和科研价值,由此文本分类技术得到了广泛的关注.文本分类在信息检索等应用领域占据着重要地位,同时也是自然语言处理等研究的关键技术.本文针对新闻文本的特点以及深度学习分类方法训练时间长的问题,提出了一种池化和注意力相结合的模型,并将其应用于中文新闻文本分类.该模型首先利用最大池化和平均池化提取出文本特征,然后利用注意力机制为句子生成权重,使用两者的拼接结果进行分类.模型在NLPCC2014新闻文本分类的数据集上进行了实验,一级类别的分类正确率达到了83. 96%,接近该数据集上的最优结果,而且比标准深度学习算法的收敛时间更短. In the information age,a large amount of text data has been generated on the Internet,which contains great commercial and scientific value. Therefore,text classification technology has been widely concerned. Text classification plays an important role in application fields such as information retrieval,and it is also a common task in scientific research such as natural language processing. Aiming at the characteristics of news text and the long training time of deep learning classification method,this paper proposes a model combining pooling and attention,and applies it to the task of Chinese news text classification. The model first extracts text features by max-pooling and average pooling,then generates weights for sentences by attention mechanism,and classifies texts using the splicing results of the two. The model is conducted on the data set of NLPCC2014 news text classification. The classification accuracy of the first-level category reaches to83. 96%,closing to the optimal result of the data set,and the convergence time of the model is much shorter than that of the standard deep learning algorithm.
作者 陶永才 杨朝阳 石磊 卫琳 TAO Yong-cai;YANG Zhao-yang;SHI Lei;WEI Lin(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China;School of Software,Zhengzhou University,Zhengzhou 450002,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第11期2393-2397,共5页 Journal of Chinese Computer Systems
基金 河南省高等学校重点科研项目(16A520027)资助
关键词 文本分类 注意力机制 最大池化 机器学习 text classification attention mechanism max pooling machine learning
  • 相关文献

参考文献2

二级参考文献12

  • 1Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 2Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.
  • 3Thorsten Joachims,Text Categorization with Support Vector Machines:Learning with Many Relevant Features[A],In:European Conferrence on Machine Learning (ECML)[C].Berlin:Springer,1998,137-142.
  • 4Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Nave Bayees[A].Proceedings of the Sixteenth International Conference on Machine Learning[C].Bled:Morgan Kaufmann,1999:258-267.
  • 5梁久祯 兰东俊 扈旻.基于先验知识的网页特征压缩与线性分类器设计[A]..第十二届全国神经计算学术大会论文集[C].北京:人民邮电出版社,2002.494-501.
  • 6王梦云,曹素青.基于字频向量的中文文本自动分类系统[J].情报学报,2000,19(6):644-649. 被引量:17
  • 7范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页[J].软件学报,2001,12(9):1386-1392. 被引量:53
  • 8黄源,李茂,吕建成.一种基于开方检验的特征选择方法[J].计算机科学,2015,42(5):54-56. 被引量:8
  • 9张辉宜,谢业名,袁志祥,孙国华.一种基于概率的卡方特征选择方法[J].计算机工程,2016,42(8):194-198. 被引量:9
  • 10樊存佳,汪友生,王雨婷.一种改进的CHI文本特征选择方法[J].计算机与现代化,2016(11):7-11. 被引量:5

共引文献167

同被引文献23

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部