Emphasizing Essential Words for Sentiment Classification Based onRecurrent Neural Networks 被引量：13

Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks

导出

摘要 With the explosion of online communication and publication, texts become obtainable via forums, chat messages, blogs, book reviews and movie reviews. Usually, these texts are much short and noisy without sufficient statistical signals and enough information for a good semantic analysis. Traditional natural language processing methods such as Bow-of-Word (BOW) based probabilistic latent semantic models fail to achieve high performance due to the short text environment. Recent researches have focused on the correlations between words, i.e., term dependencies, which could be helpful for mining latent semantics hidden in short texts and help people to understand them. Long short-term memory (LSTM) network can capture term dependencies and is able to remember the information for long periods of time. LSTM has been widely used and has obtained promising results in variants of problems of understanding latent semantics of texts. At the same time, by analyzing the texts, we find that a number of keywords contribute greatly to the semantics of the texts. In this paper, we establish a keyword vocabulary and propose an LSTM-based model that is sensitive to the words in the vocabulary; hence, the keywords leverage the semantics of the full document. The proposed model is evaluated in a short-text sentiment analysis task on two datasets: IMDB and SemEval-2016, respectively. Experimental results demonstrate that our model outperforms the baseline LSTM by 1%similar to 2% in terms of accuracy and is effective with significant performance enhancement over several non-recurrent neural network latent semantic models (especially in dealing with short texts). We also incorporate the idea into a variant of LSTM named the gated recurrent unit (GRU) model and achieve good performance, which proves that our method is general enough to improve different deep learning models. With the explosion of online communication and publication, texts become obtainable via forums, chat messages, blogs, book reviews and movie reviews. Usually, these texts are much short and noisy without sufficient statistical signals and enough information for a good semantic analysis. Traditional natural language processing methods such as Bow-of-Word (BOW) based probabilistic latent semantic models fail to achieve high performance due to the short text environment. Recent researches have focused on the correlations between words, i.e., term dependencies, which could be helpful for mining latent semantics hidden in short texts and help people to understand them. Long short-term memory (LSTM) network can capture term dependencies and is able to remember the information for long periods of time. LSTM has been widely used and has obtained promising results in variants of problems of understanding latent semantics of texts. At the same time, by analyzing the texts, we find that a number of keywords contribute greatly to the semantics of the texts. In this paper, we establish a keyword vocabulary and propose an LSTM-based model that is sensitive to the words in the vocabulary; hence, the keywords leverage the semantics of the full document. The proposed model is evaluated in a short-text sentiment analysis task on two datasets: IMDB and SemEval-2016, respectively. Experimental results demonstrate that our model outperforms the baseline LSTM by 1%similar to 2% in terms of accuracy and is effective with significant performance enhancement over several non-recurrent neural network latent semantic models (especially in dealing with short texts). We also incorporate the idea into a variant of LSTM named the gated recurrent unit (GRU) model and achieve good performance, which proves that our method is general enough to improve different deep learning models.

作者 Fei Hu Li Li Zi-Li Zhang Jing-Yuan Wang Xiao-Fei Xu

机构地区 College of Computer and Information Science Network Centre

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第4期785-795,共11页 计算机科学技术学报（英文版）

关键词 short text understanding long short-term memory (LSTM) gated recurrent unit (GRU) sentiment classification deep learning short text understanding long short-term memory (LSTM) gated recurrent unit (GRU) sentiment classification deep learning

分类号 TP [自动化与计算机技术]

引文网络
相关文献

同被引文献115

1徐琳宏,林鸿飞,潘宇,任惠,陈建美.情感词汇本体的构造[J].情报学报,2008,27(2):180-185. 被引量：384
2罗亮生,张文欣.基于客户价值的航空公司客户关系管理策略[J].企业经济,2008(12):20-22. 被引量：10
3张紫琼,叶强,李一军.互联网商品评论情感分析研究综述[J].管理科学学报,2010,13(6):84-96. 被引量：154
4赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848. 被引量：543
5赵妍妍,秦兵,车万翔,刘挺.基于句法路径的情感评价单元识别[J].软件学报,2011,22(5):887-898. 被引量：58
6刘志明,刘鲁.基于机器学习的中文微博情感分类实证研究[J].计算机工程与应用,2012,48(1):1-4. 被引量：124
7应维云.随机森林方法及其在客户流失预测中的应用研究[J].管理评论,2012,24(2):140-145. 被引量：20
8樊娜,安毅生,李慧贤.基于K-近邻算法的文本情感分析方法研究[J].计算机工程与设计,2012,33(3):1160-1164. 被引量：10
9唐晓波,严承希.基于旋进原则和支持向量机的文本情感分析研究[J].情报理论与实践,2013,36(1):98-103. 被引量：11
10琚春华,卢琦蓓,郭飞鹏.融入个体活跃度的电子商务客户流失预测模型[J].系统工程理论与实践,2013,33(1):141-150. 被引量：18

引证文献13

1Fei-Fei Kou,Jun-Ping Du,Cong-Xian Yang,Yan-Song Shi,Wan-Qiu Cui,Mei-Yu Liang,Yue Geng.Hashtag Recommendation Based on Multi-Features of Microblogs[J].Journal of Computer Science & Technology,2018,33(4):711-726. 被引量：5
2韩毅,张涵,李跃新.基于情感直方图特征的中文文本情感分类方法[J].计算机工程与设计,2018,39(7):1917-1922.
3洪巍,李敏.文本情感分析方法研究综述[J].计算机工程与科学,2019,41(4):750-757. 被引量：87
4刘明明,李震霄,郑丽丽.基于双向循环神经网络的字符级文本分类[J].江苏建筑职业技术学院学报,2019,19(4):29-34. 被引量：1
5国显达,那日萨,崔少泽.基于CNN-BiLSTM的消费者网络评论情感分析[J].系统工程理论与实践,2020,40(3):653-663. 被引量：33
6尹春勇,何苗.基于改进胶囊网络的文本分类[J].计算机应用,2020,40(9):2525-2530. 被引量：11
7余亮,蒋玉明.基于Senti-PMU模型的文本情感分析[J].现代计算机,2020,26(29):19-24.
8杨奎河,赵萌萌.基于深度学习的情感分析技术[J].信息通信,2020(8):99-101. 被引量：7
9王仲昊,万相奎,李风从,危竞,刘俊杰.多模型以动态权重相融合的词相似性分析[J].华侨大学学报（自然科学版）,2021,42(1):121-127. 被引量：2
10李菲菲,吴璠,王中卿.基于生成式对抗网络和评论专业类型的情感分类研究[J].数据分析与知识发现,2021,5(4):72-79. 被引量：7

二级引证文献156

1杨捷,范美位,罗成臣,张思路.面向电力工单文本的服务失误识别[J].云南大学学报（自然科学版）,2020,42(S02):32-37. 被引量：1
2彭凡会.美食类短视频弹幕中用户情感体验分析——基于B站美食短视频账号“绵羊料理”的文本分析[J].新媒体研究,2023,9(6):28-32. 被引量：2
3杨倩,刁雅静,李家明,葛世伦.基于弹幕的参与式网站用户交互体验研究[J].知识管理论坛,2022(4):417-430. 被引量：1
4江进德,张玉可.皖北乡村旅游的情感特征及其影响因素分析[J].商丘师范学院学报,2023,39(6):64-69.
5包乾辉,李佳利,石淑珍,戴引,刘雪.基于DSLML的鸡蛋消费在线评论情感分析[J].农业机械学报,2021,52(S01):496-503. 被引量：5
6张苑,祝小兰,杨东晓.基于深度学习的疫情情感分析[J].智能计算机与应用,2022,12(3):40-45. 被引量：1
7相德宝,覃安琪.信任与期待:国际社交媒体推特上的上海城市情绪研究[J].中华文化与传播研究,2022(2):57-72.
8王冬,甘恒,黄文峰,黄政龙,陈中举.基于MLP模型的影评情感分析研究[J].中国科技论文在线精品论文,2021(2):208-211. 被引量：2
9汪军远.血管生成抑制因子及血管生成抑制疗法[J].国外医学情报,2000,21(1):11-13. 被引量：2
10亚历山大.杰夫·塞德里克的摄影经营之道[J].摄影世界,2000(6):54-56.

1杨学军,窦勇,胡庆丰.Progress and Challenges in High Performance Computer Technology[J].Journal of Computer Science & Technology,2006,21(5):674-681. 被引量：7
2Hua-Ping Zhang,Rui-Qi Zhang,Yan-Ping Zhao,Bao-Jun Ma.Big Data Modeling and Analysis of Microblog Ecosystem[J].International Journal of Automation and computing,2014,11(2):119-127. 被引量：6
3Shuo QIU,Jiqiang LIU,Yanfeng SHI,Rui ZHANG.Hidden policy ciphertext-policy attribute-based encryption with keyword search against keyword guessing attack[J].Science China(Information Sciences),2017,60(5):126-137. 被引量：22
4Meng Chen,Lin-Lin Zhang,Xiaohui Yu,Yang Liu.Weighted Co-Training for Cross-Domain Image SentimentClassification[J].Journal of Computer Science & Technology,2017,32(4):714-725. 被引量：2
5Tanakom Wichaiwong Chuleerat Jaruskulchai.MEXIR： An Implementation of High Performance and High Precision on XML Retrieval[J].Computer Technology and Application,2011,2(4):301-310.
6Li Li.Innovative running-related researches[J].Journal of Sport and Health Science,2017,6(2):145-145. 被引量：1
7许萌.应用于词法分析器的算法分析优化[J].科技经济市场,2017(5):154-155. 被引量：1
8Kai-Yuan Cui,Peng-Jie Ren,Zhu-Min Chen,Tao Lian,Jun Ma.Relation Enhanced Neural Model for Type Classification of EntityMentions with a Fine-Grained Taxonomy[J].Journal of Computer Science & Technology,2017,32(4):814-827.
9徐潺源.Marquez——the late bloomer[J].校园英语,2017(21):98-98.
10周孟,朱福喜.基于情感标签的极性分类[J].电子学报,2017,45(4):1018-1024. 被引量：4

Journal of Computer Science & Technology

2017年第4期

浏览历史

内容加载中请稍等...

Emphasizing Essential Words for Sentiment Classification Based onRecurrent Neural Networks 被引量：13

同被引文献115

引证文献13

二级引证文献156

相关作者

相关机构

相关主题

浏览历史