Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and low-frequency words.Accordingly,the sentiment inf...Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and low-frequency words.Accordingly,the sentiment information of low-frequency words is insufficiently captured,thus resulting in inaccurate sentiment word embedding and degradation of overall performance of sentiment analysis.A Bayesian estimation-based sentiment word embedding(BESWE)model,which aims to precisely extract the sentiment information of low-frequency words,has been proposed.In the model,a Bayesian estimator is constructed based on the co-occurrence probabilities and sentiment proba-bilities of words,and a novel loss function is defined for sentiment word embedding learning.The experimental results based on the sentiment lexicons and Movie Review dataset show that BESWE outperforms many state-of-the-art methods,for example,C&W,CBOW,GloVe,SE-HyRank and DLJT1,in sentiment analysis tasks,which demonstrate that Bayesian estimation can effectively capture the sentiment information of low-frequency words and integrate the sentiment information into the word embedding through the loss function.In addition,replacing the embedding of low-frequency words in the state-of-the-art methods with BESWE can significantly improve the performance of those methods in sentiment analysis tasks.展开更多
基金Funding information National Statistical Science Research Project of China,Grant/Award Number:2016LY98Science and Technology Department of Guangdong Province in China,Grant/Award Numbers:2016A010101020,2016A010101021,2016A010101022+2 种基金Characteristic Innovation Projects of Guangdong Colleges and Universities,Grant/Award Numbers:2018KTSCX049,2018GKTSCX069Science and Technology Plan Project of Guangzhou,Grant/Award Numbers:201802010033,201903010013Bidding Project of Laboratory of Language Engineering and Computing of Guangdong University of Foreign Studies,Grant/Award Number:LEC2019ZBKT005。
文摘Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and low-frequency words.Accordingly,the sentiment information of low-frequency words is insufficiently captured,thus resulting in inaccurate sentiment word embedding and degradation of overall performance of sentiment analysis.A Bayesian estimation-based sentiment word embedding(BESWE)model,which aims to precisely extract the sentiment information of low-frequency words,has been proposed.In the model,a Bayesian estimator is constructed based on the co-occurrence probabilities and sentiment proba-bilities of words,and a novel loss function is defined for sentiment word embedding learning.The experimental results based on the sentiment lexicons and Movie Review dataset show that BESWE outperforms many state-of-the-art methods,for example,C&W,CBOW,GloVe,SE-HyRank and DLJT1,in sentiment analysis tasks,which demonstrate that Bayesian estimation can effectively capture the sentiment information of low-frequency words and integrate the sentiment information into the word embedding through the loss function.In addition,replacing the embedding of low-frequency words in the state-of-the-art methods with BESWE can significantly improve the performance of those methods in sentiment analysis tasks.