摘要
针对长文本在文本分类时提取语义关键特征难度大,分类效果差等问题,建立基于循环神经网络变体和卷积神经网络(BGRU-CNN)的混合模型,实现中文长文本的准确分类。首先,通过PV-DM模型将文本表示为句向量,并将其作为神经网络的输入;然后,建立BGRU-CNN模型,经双向门控循环单元(B-GRU)实现文本的序列信息表示,利用卷积神经网络(CNN)提取文本的关键特征,通过Softmax分类器实现文本的准确分类;最后,经SogouC和THUCNews中文语料集测试,文本分类准确率分别达到89.87%和94.65%。测试结果表明,循环层提取的文本序列特征通过卷积层得到了进一步优化,文本的分类性能得到了提高。
In view of the long text semantic key features is difficult to extract,poor classification results in a text classification,a mixed model based on recurrent neural network variants and convolutional neural networks (BGRU-CNN) was established to achieve accurate classification of Chinese long texts. First,the text is represented as a sentence vector by PV-DM model as input to the neural network. Then,the BGRU-CNN model is established,the sequence information of the text is represented by the bidirectional gate recurrent unit (B-GRU). The key features of the text are extracted by the convolution neural network (CNN),and the text is classified by the Softmax classifier. Finally,by SogouC and THUCNews corpus test,the accuracy of text classification reaches 89.87% and 94.65% respectively. The test results show that the text sequence features extracted by the recurrent layer are further optimized through convolution layer,and the classification performance of the text is improved.
作者
李云红
梁思程
任劼
李敏奇
张博
李禹萱
LI Yunhong;LIANG Sicheng;REN Jie;LI Minqi;ZHANG Bo;LI Yuxuan(School of Electronics and Information,Xi′an Polytechnic University,Xi′an 710048,China;State Grid Xi′an Power Supply Company,Xi′an 710032,China)
出处
《西北大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第4期573-579,共7页
Journal of Northwest University(Natural Science Edition)
基金
国家自然科学基金资助项目(61471161)
陕西省科技厅自然科学基础研究重点项目(2016JZ026)
西安工程大学大学生创新创业项目(chx201824)
关键词
文本分类
句向量
循环神经网络
卷积神经网络
text classification
sentence vector
recurrent neural network
convolution neural network