摘要
随着Web2.0思想观念及其技术的广泛应用,基于社交媒介的UGC对经济、政治、社会、军事、外交及其他方面都产生了重要的影响,基于UGC的情感倾向性自动识别研究具有重要的理论意义和实际应用价值。本研究针对当前情感倾向性自动识别中亟待解决的挑战性问题,研究并提出了基于特征选择和倾向分析联合优化的UGC情感倾向性自动识别方法。本研究将所提出的方法应用于实际的中文和英文、两类和五类UGC情感倾向性自动识别中。基于两种不同语言的语料库:中文豆瓣网电影评论文本和英文IMDB电影评论文本,本研究构建了基于特征选择和倾向分析联合优化的情感倾向性自动识别模型,并对该模型进行了检验。一系列的实验结果表明,本研究所提出并构建的基于特征选择和倾向分析联合优化的情感倾向性自动识别模型能够提高UGC情感倾向性自动识别的效果,从而说明了该模型对于中文和英文自动情感分析的有效性。
With the wide application of the concept and techniques of web 2.0, User Generated Content (UGC) based on social media has generated a significant impact on economy, politics, society, military, diplomacy and other aspects. The study of automatic UGC sentiment analysis has critical theoretical significance and practical value. Sentiment analysis is also known as emotional polarity computation, opinion extraction or semantic classification. The performance of automatic sentiment analysis primarily depends on feature selection and sentiment classification. Latent Semantic Analysis (LSA) and Support Vector Machines (SVM) are two important techniques and widely applied in sentiment classification. However, few studies optimize these two approaches in sentiment analysis. The effectiveness of applying a synchronization optimization approach based on feature selection and sentiment classification to sentiment analysis remains unclear. In this paper, the automatic sentiment analysis method, based on feature selection and sentiment classification optimization, is proposed, which aims at the challenging problems that need to be solved in current automatic sentiment analysis. We propose a two-stage synchronization optimization-based sentiment analysis approach to improve sentiment analysis with LSA for UGC feature selection and SVM as the sentiment classification engine. This approach utilizes a particle swarm optimization algorithm (PSO) to obtain an optimal combination of UGC feature dimensions and parameters in the SVM. Firstly, we provide an overview of our UGC sentiment analysis method based on feature selection and sentiment classification optimization. Then we elaborate on their components. Furthermore, this method is applied to real two-class and five-class Chinese and English UGC automatic sentiment analysis. The sentiment analysis model based on feature selection and sentiment classification optimization are built and tested on two different language datasets: Douban movie reviews in Chinese and IMDb movie reviews in English. Some sentiment analysis studies used the IMDb movie review dataset. Douban is the most popular movie review website in China. The results of a series of experiments show that our proposed and built automatic sentiment analysis models based on feature selection, and sentiment classification optimization can boost the performance of UGC automatic sentiment analysis. These results indicate that the automatic sentiment analysis model based on feature selection and sentiment classification optimization is effective for both Chinese and English automatic sentiment analysis.
作者
李欣苗
陈云
LI Xin-miao;CHEN Yun(School of Information Management & Engineering, Shanghai University of Finance and Economics, Shanghai, 200433, China;School of Finance, Shanghai University of Finance and Economics, Shanghai, 200433, China;Shanghai Key Laboratory of Financial Information Technology(Shanghai University of Finance and Economics), Shanghai 200433, China)
出处
《管理工程学报》
CSSCI
CSCD
北大核心
2019年第2期61-71,共11页
Journal of Industrial Engineering and Engineering Management
基金
国家自然科学基金资助项目(71001059)
上海市自然科学基金资助项目(14ZR1413400)
上海市科学技术委员会科研计划项目(14511107202)
关键词
特征选择和倾向分析联合优化
情感分析
粒子群算法
用户产生内容
Feature selection and sentiment classification optimization
Sentiment analysis
Particle swarm optimization (PSO)
User generated content (UGC)