期刊文献+

基于多卷积核字词特征的中文短文本分类方法

A Chinese Short Text Classification Method Based on the Features of Words and Characters with Multiple Convolution Kernels
下载PDF
导出
摘要 中文短文本存在字数少、歧义多以及信息不规范等特点,导致其文本特征信息难以提取与表达。目前大多数文本分类方法采用单卷积核的卷积神经网络来提取文本局部特征,这通常会由于网络参数随机初始化不一致而导致模型分类效果不佳。为此,提出了一种基于多卷积核字词特征的短文本分类模型(Multi-CNNFusionofCharactersandWords,MCFCW)。首先采用预训练ERNIE、Word2vec模型丰富文本字词嵌入表示;然后分别采用多卷积核TextCNN、DPCNN充分提取不同角度的文本语义信息,同时有效降低网络参数随机初始化的影响;最后拼接两个通道提取到的字词高层特征向量作为最终的文本分类特征。在THUCNews新闻标题数据集上进行了模型评估。结果表明,模型在精准率、召回率和F1值3种评价指标上均优于目前的主流模型,具有较好的短文本分类效果。 Short Chinese texts have the characteristics of few words,many ambiguities,and irregular information,which makes it difficult to extract and express text feature information.At present,local text features are usually extracted by using a single convolutional kernel convolutional neural network by most text classification methods,which often leads to poor model classification results due to inconsistent random initialization of network parameters.To this end,a short text classification model MCFCW(Multi-CNN Fusion of Characters and Words)based on multi-convolution kernel word features has been proposed.Firstly,pre-trained ERNIE and Word2vec models are used to enrich text word embedding representation.Then,text semantic information from different angles are fully extracted by using multi-convolution kernel TextCNN and DPCNN;meanwhile,the influences of random initialization of network parameters is effectively weakened.Finally,word high-level feature vectors extracted from two channels are spliced,which is used as the final text classification feature.The model is evaluated on the THUCNews news headline dataset.The results show that the model is superior to the current mainstream models in the three evaluation indicators of precision rate,recall rate and F1 value,and has a better short text classification effect.
作者 李攀 吴亚东 褚琦凯 张贵宇 付朝帅 LI Pan;WU Yadong;CHU Qikai;ZHANG Guiyu;FU Chaoshuai(School of Automation and Information Engineering,Sichuan University of Science and Engineering,Yibin 644000,China;School of Computer Science and Engineering,Sichuan University of Science and Engineering,Yibin 644000,China;Artificial Intelligence Key Laboratory of Sichuan Province,Yibin 644000,China;Big Data Visual Analysis Engineering Technology Laboratory of Sichuan Province,Yibin 644000,China)
出处 《四川轻化工大学学报(自然科学版)》 CAS 2023年第1期73-83,共11页 Journal of Sichuan University of Science & Engineering(Natural Science Edition)
基金 四川省科技成果转移转化示范项目(2020ZHCG0040) 四川省重大科技专项项目(2018GZDZX0045)。
关键词 中文短文本分类 ERNIE Word2vec 多卷积核字词特征 卷积神经网络 Chinese short text classification ENRIE Word2vec features of words and characters with multiple convolution kernels convolutional neural network
  • 相关文献

参考文献11

二级参考文献71

共引文献126

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部