Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with o...Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.展开更多
在大数据背景下,如何快速准确的从庞大数据集中筛选过滤出有用信息一直是自然语言处理领域的一个研究目标,对用户所提问题进行意图识别能够帮助用户在向问答系统进行沟通的时候,根据用户提出的直接或者间接的信息来快速判断用户的真实意...在大数据背景下,如何快速准确的从庞大数据集中筛选过滤出有用信息一直是自然语言处理领域的一个研究目标,对用户所提问题进行意图识别能够帮助用户在向问答系统进行沟通的时候,根据用户提出的直接或者间接的信息来快速判断用户的真实意图,过滤无用冗余信息后返回一个概率最大答案给用户。FastText是Facebook AI Research推出的文本分类和词训练工具,它的最大特点是模型简单并且在文本分类的准确率上,和现有的深度学习的方法效果相近,即在保证了准确率的情况下大大缩短了分类时间。展开更多
文摘Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.
文摘在大数据背景下,如何快速准确的从庞大数据集中筛选过滤出有用信息一直是自然语言处理领域的一个研究目标,对用户所提问题进行意图识别能够帮助用户在向问答系统进行沟通的时候,根据用户提出的直接或者间接的信息来快速判断用户的真实意图,过滤无用冗余信息后返回一个概率最大答案给用户。FastText是Facebook AI Research推出的文本分类和词训练工具,它的最大特点是模型简单并且在文本分类的准确率上,和现有的深度学习的方法效果相近,即在保证了准确率的情况下大大缩短了分类时间。