摘要
基于卷积神经网络与循环神经网络的混合文本分类模型通常使用单通道词嵌入。单通道词嵌入空间维度低,特征表示单一,导致一维卷积神经网络不能充分学习文本的空间特征,影响了模型的性能。因此,该文提出一种融合通道特征的混合神经网络文本分类模型。该模型使用了双通道词嵌入丰富文本表示,增加了空间维度,在卷积的过程中融合了通道特征,优化了空间特征与时序特征的结合方式,最终提高了混合模型的分类性能。在IMDB、20NewsGroups、复旦中文数据集、THUC数据集上进行实验,该模型的分类准确率相比于传统卷积神经网络平均提升了1%,在THUC数据集上准确率最高提升了1.3%。
The hybrid text classification model based on convolutional neural network and recurrent neural network usually uses single-channel word embedding. Single-channel word embedding has low spatial dimension, leading that one-dimensional convolutional neural network fail to fully capture text features. This paper proposes a hybrid neural network text classification model combined with the channel features. The model uses two-channel word embedding to enrich text representation, fuses channel feature in the process of convolution, and optimizes the combination of spatial and temporal features. Tested on IMDB, 20 NewsGroups, Fudan Chinese dataset and THUC dataset, the proposed model improves the classification accuracy by an average of 1% compared with the traditional methods, with a top increase of 1.3% on the THUC dataset.
作者
韩永鹏
陈彩
苏航
梁毅
HAN Yongpeng;CHEN Cai;SU Hang;LIANG Yi(Faculty of Information,Beijing University of Technology,Beijing 100124,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第2期78-88,共11页
Journal of Chinese Information Processing
基金
国家自然科学基金(61672505,91546111)。
关键词
通道特征
神经网络
文本分类
channel feature
neural network
text classification