摘要
针对微博文本情感分析中大量有标记数据难获取,以及文本特征学习不完全的问题,提出将长短时记忆网络(Long Short-Term Memory,LSTM)及其衍生模型双向长短时记忆网络(Bi-LSTM)引入变分自编码生成模型,构建基于变分自编码的半监督文本分类模型。其中LSTM作为变分编码器中的编码器和解码器,Bi-LSTM作为分类器。分类器既为编码器提供标签信息共同生成隐变量,也与隐变量通过解码器共同重构数据,利用无标记数据的有用信息提高分类器的性能。与其他方法在同一公开数据集上对比的实验结果表明,该模型的分类效果更好。
Aiming at the problems of large amounts of labeled data in the sentiment analysis of Weibo text and incomplete learning of the text features,the Long-short-term memory network(LSTM)and its derivative model Bi-long-short-term memory network(Bi-LSTM)were introduced into the variational auto-encoding generation.In the model,a semi-supervised text classification model based on variational self-encoding is constructed.Among them,LSTM was used as the encoder and decoder in the variational encoder,and Bi-LSTM was used as the classifier;the classifier not only provided the label information for the encoder to jointly generate the hidden variable,but also reconstructed the data together with the hidden variable through the decoder.Using the useful information of unlabeled data to improve the performance of the classifier.Compared with other methods on the same open dataset,the experimental results show that the classification effect of this model is better.
作者
韩萍
刘爽
贾云飞
孙佳慧
Han Ping;Liu Shuang;Jia Yunfei;Sun Jiahui(School of Electronic Information and Automation,China Civil Aviation University,Tianjin 300300,China;School of Computer Science and Technology,China Civil Aviation University,Tianjin 300300,China;Basic Laboratory Center,China Civil Aviation University,Tianjin 300300,China)
出处
《计算机应用与软件》
北大核心
2021年第12期280-285,共6页
Computer Applications and Software
基金
民航安全能力建设项目(20600418)
中国民航大学中央高校基本科研业务费专业项目(3122019114)。
关键词
微博文本
情感分析
变分自编码
Weibo text
Sentiment analysis
Variational self-encoding