摘要
Sentiment analysis, a hot research topic, presents new challenges for understanding users' opinions and judg-ments expressed online. They aim to classify the subjective texts by assigning them a polarity label. In this paper, weintroduce a novel machine learning framework using auto-encoders network to predict the sentiment polarity label at theword level and the sentence level. Inspired by the dimensionality reduction and the feature extraction capabilities of theauto-encoders, we propose a new model for distributed word vector representation "PMI-SA" using as input pointwise-mutual-information "PMI" word vectors. The resulted continuous word vectors are combined to represent a sentence. Anunsupervised sentence embedding method, called Contextual Recursive Auto-Encoders "CoRAE", is also developed forlearning sentence representation. Indeed, CoRAE follows the basic idea of the recursive auto-encoders to deeply composethe vectors of words constituting the sentence, but without relying on any syntactic parse tree. The CoRAE model consistsin combining recursively each word with its context words (neighbors' words: previous and next) by considering the wordorder. A support vector machine classifier with fine-tuning technique is also used to show that our deep compositionalrepresentation model CoRAE improves significantly the accuracy of sentiment analysis task. Experimental results demon-strate that CoRAE remarkably outperforms several competitive baseline methods on two databases, namely, Sanders twittercorpus and Facebook comments corpus. The CoRAE model achieves an efficiency of 83.28% with the Facebook dataset and97.57% with the Sanders dataset.