期刊文献+

基于循环神经网络变体和卷积神经网络的文本分类方法 被引量:19

Text classification method based on recurrent neural network variants and convolutional neural network
下载PDF
导出
摘要 针对长文本在文本分类时提取语义关键特征难度大,分类效果差等问题,建立基于循环神经网络变体和卷积神经网络(BGRU-CNN)的混合模型,实现中文长文本的准确分类。首先,通过PV-DM模型将文本表示为句向量,并将其作为神经网络的输入;然后,建立BGRU-CNN模型,经双向门控循环单元(B-GRU)实现文本的序列信息表示,利用卷积神经网络(CNN)提取文本的关键特征,通过Softmax分类器实现文本的准确分类;最后,经SogouC和THUCNews中文语料集测试,文本分类准确率分别达到89.87%和94.65%。测试结果表明,循环层提取的文本序列特征通过卷积层得到了进一步优化,文本的分类性能得到了提高。 In view of the long text semantic key features is difficult to extract,poor classification results in a text classification,a mixed model based on recurrent neural network variants and convolutional neural networks (BGRU-CNN) was established to achieve accurate classification of Chinese long texts. First,the text is represented as a sentence vector by PV-DM model as input to the neural network. Then,the BGRU-CNN model is established,the sequence information of the text is represented by the bidirectional gate recurrent unit (B-GRU). The key features of the text are extracted by the convolution neural network (CNN),and the text is classified by the Softmax classifier. Finally,by SogouC and THUCNews corpus test,the accuracy of text classification reaches 89.87% and 94.65% respectively. The test results show that the text sequence features extracted by the recurrent layer are further optimized through convolution layer,and the classification performance of the text is improved.
作者 李云红 梁思程 任劼 李敏奇 张博 李禹萱 LI Yunhong;LIANG Sicheng;REN Jie;LI Minqi;ZHANG Bo;LI Yuxuan(School of Electronics and Information,Xi′an Polytechnic University,Xi′an 710048,China;State Grid Xi′an Power Supply Company,Xi′an 710032,China)
出处 《西北大学学报(自然科学版)》 CAS CSCD 北大核心 2019年第4期573-579,共7页 Journal of Northwest University(Natural Science Edition)
基金 国家自然科学基金资助项目(61471161) 陕西省科技厅自然科学基础研究重点项目(2016JZ026) 西安工程大学大学生创新创业项目(chx201824)
关键词 文本分类 句向量 循环神经网络 卷积神经网络 text classification sentence vector recurrent neural network convolution neural network
  • 相关文献

参考文献8

二级参考文献60

  • 1胡卫华,朱永利.贝叶斯网络推理算法的研究和实现[J].华北电力大学学报(自然科学版),2004,31(5):63-65. 被引量:7
  • 2樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 3孙晋文,肖建国.基于SVM文本分类中的关键词学习研究[J].计算机科学,2006,33(11):182-184. 被引量:12
  • 4马金娜,田大钢.基于支持向量机的中文文本自动分类研究[J].系统工程与电子技术,2007,29(3):475-478. 被引量:14
  • 5SEBASTIANI F. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002,34(1 ):1-47.
  • 6SAHAMI M. Learning limited dependence Bayesian elassifiers[C]//proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Menlo Park : AAAI Press, 1996 : 335-338.
  • 7D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998, 4-15.
  • 8Y. Yang, X. Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf. onResearch and Development in the Information Retrieval. NewYork: ACM Press, 1999.
  • 9Y. Yang, C. G. Chute. An example based mapping method for text categorization and retrieval. ACM Trans. on Information Systems, 1994, 12(3): 252 -277.
  • 10E. Wiener. A neural network approach to topic spotting. The 4th Annual Syrup. on Document Analysis and Information Retrieval,Las Vegas, NV, 1995.

共引文献207

同被引文献173

引证文献19

二级引证文献142

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部