期刊文献+

CCM-MF:基于多维度特征融合的中文文本分类模型 被引量:1

CCM-MF:Chinese-text Classification Model Based on Fused Multi-dimensional Features
下载PDF
导出
摘要 针对中文文本中不同维度特征所携带的语义信息具有差异性的问题,本文提出一种基于多维度特征融合的中文文本分类模型:CCM-MF(Chinese-text Classification Model Based on FusedMulti-dimensional Features)。该模型融合层次维度和空间维度特征,以提高中文文本分类的准确率。首先,在层次维度上,使用预训练模型ERNIE(Enhanced Representation through Knowledge Integration)获取包含字、词及实体级别特征的词向量;然后,在空间维度上,将包含层次维度特征的词向量分别输入到改进后的深度金字塔卷积神经网络(Deep Pyramid Convolutional Neural Networks,DPCNN)模型及附加注意力机制的双向长短期记忆网络(Attention-Based Bidirectional Long Short-Term Memory Networks,Att-BLSTM)模型中,得到局部语义特征和全局语义特征;最后,将得到的空间维度特征分别作用于Softmax分类器,再对计算结果进行融合并输出分类结果。通过在多个公开数据集上进行实验,较现有主流的文本分类方法,本模型在准确率上有更好的表现,证明了该模型的有效性。 In view of the difference of semantic information carried by different dimensional features in Chinese text, a Chinese-text Classification Model based on Fused Multi-dimensional Features(CCM-MF) was proposed.The model combines hierarchical dimension and spatial dimension features to improve the accuracy of Chinese text classification.Firstly, on the hierarchical dimension, the Enhanced Representation through Knowledge Integration(ERNIE) pre-training model is used to obtain word vectors containing features of character, word, and entity levels.Then, on the spatial dimension, the word vectors containing hierarchical dimension features are input into the improved Deep Pyramid Convolutional Neural Networks(DPCNN) model and Attention-Based Bidirectional Long Short-Term Memory Networks(Att-BLSTM) model to obtain local and global semantic features, respectively.Finally, the obtained spatial dimension features are applied to the Softmax classifier, and then the calculation results are fused and the classification results are output. Through experiments on multiple public data sets, this model has better performance in accuracy than the existing mainstream text classification methods, which proves the effectiveness of the model.
作者 马子晨 张顺香 刘云朵 王星光 张友强 MA Zichen;ZHANG Shunxiang;LIU Yunduo;WANG Xingguang;ZHANG Youqiang(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan,Anhui,232001,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei,Anhui,230088,China)
出处 《广西科学》 CAS 北大核心 2023年第1期35-42,共8页 Guangxi Sciences
基金 国家自然科学基金面上项目(62076006) 安徽省高校协同创新项目(GXXT 2021008)资助。
关键词 中文文本分类 多维度 ERNIE DPCNN Att-BLSTM Chinese text categorization multiple dimensions ERNIE DPCNN Att-BLSTM
  • 相关文献

参考文献3

二级参考文献5

共引文献40

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部