摘要
情感分析的一个重要应用是判断用户对于产品评论的情感倾向,这些用户评论一般都是字数较少的短文本。传统方法多利用词袋模型获取单词的浅层特征来进行情感分析,利用这些简单特征训练的模型在短文本,尤其是在复杂语法问题上效果并不理想。通过利用深度递归神经网络算法来捕获句子语义信息,并引入中文"情感训练树库"作为训练数据来发现词语情感信息,在短文本情感五分类的问题上取得了较高的准确率。针对复杂模型在海量数据训练上的时间效率问题,通过在Spark并行框架下实现了模型的并行化处理,使得模型的可扩展性和时间效率得到提升。
A significant application of sentiment analysis is to determine the user's semannc orjentauon in product reviews which are generally short texts. Traditional methods often acquire the shallow characteristics of words for sentiment analysis through bag-of-words model. However, the model trained through these simple characteristics doesn' t have a good performance in short text, especially complex syntax context. Through using deep recursive neural network to capture the semantic information and introducing a Chinese sentiment training treebank as the training set to find the sentiment information, a relatively higher accuracy on five-class short text sentiment analysis is achieved. Aiming at the problem of training time efficiency in large scale data, the parallelization is implemented through Spark, which can enhance the scalability and time efficiency of the model.
作者
谢铁
郑啸
张雷
王修君
Xie Tie Zheng Xiao Zhang Lei Wang Xiujun(School of Computer Science and Technology,Anhui University of Technology, Maanshan 243002, Anhui, Chin)
出处
《计算机应用与软件》
2017年第3期205-211,232,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61402008
61402009)
安徽省科技重大专项(16030901060)
安徽高校省级自然科学研究重大项目(KJ2014ZD 05)
安徽省高校优秀青年人才支持计划