期刊文献+

基于LDA和BiGRU的文本分类 被引量:3

Text Classification Based on LDA and BiGRU
下载PDF
导出
摘要 文本分类是自然语言处理的基础任务,文本中的特征稀疏性和提取特征所用的神经网络影响后续的分类效果。针对文本中的特征信息不足以及传统模型上下文依赖关系方面不足的问题,提出经过TF-IDF加权的词向量和LDA主题模型相融合,利用双向门控循环神经网络层(BiGRU)充分提取文本深度信息特征的分类方法。该方法主要使用的数据集是天池比赛新闻文本分类数据集,首先用Word2vec和LDA模型分别在语料库中训练词向量,Word2vec经过TF-IDF进行加权所得的词向量再与LDA训练的经过最大主题概率扩展的词向量进行简单拼接,拼接后得到文本矩阵,将文本矩阵输入到BiGRU神经网络中,分别从前后两个反方向提取文本深层次信息的特征向量,最后使用softmax函数进行多分类,根据输出的概率判断所属的类别。与现有的常用文本分类模型相比,准确率、F1值等评价指标都有了较高的提升。 Text classification is a basic task of natural language processing.The feature sparsity in the text and the neural network used to extract the feature affect the subsequent classification effect.In order to solve the problems of feature sparsity in text and the deficiency of context dependence in traditional models,we propose a new classification method which combines TF-IDF-weighted word vectors with LDA subject model and uses bidirectional gating cyclic neural network layer(BIGRU) to fully extract the features of depth information in text.The main data set is the news text classification data set of Tianchi Competition.Firstly,word vectors are trained in the corpus by Word2 vec and LDA models respectively.Word2 vec weighted word vectors by TF-IDF are then simply joined with word vectors trained by LDA with maximum topic probability expansion.The text matrix is obtained after the Mosaic,and the text matrix is input into the Bigru neural network,and the feature vectors of the deep information of the text are extracted from the two opposite directions respectively.Finally,the softmax function is used for multiple classification,and the category is judged according to the output probability.Compared with the existing common text classification model,the accuracy,F1 value and other evaluation indicators have been improved.
作者 冼广铭 王鲁栋 曾碧卿 梅灏洋 陶睿 XIAN Guang-ming;WANG Lu-dong;ZENG Bi-qing;MEI Hao-yang;TAO Rui(School of Software,South China Normal University,Foshan 528225,China)
出处 《计算机技术与发展》 2022年第4期15-20,共6页 Computer Technology and Development
基金 国家自然科学基金(61876067) 广东省普通高校人工智能重点领域专项(2019KZDZX1033)。
关键词 LDA主题模型 BiGRU Word2vec 深度学习 文本分类 LDA topic model BiGRU Word2vec deep learning text classification
  • 相关文献

参考文献13

二级参考文献95

共引文献443

同被引文献14

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部