摘要
为实现对批量评论的情感分析,高效挖掘评论价值,该文设计并实现了基于Spark的文本评论情感分析工具。首先,该文对数据进行预处理,使各类训练数据保持平衡,并使用结巴分词对文本进行切分;其次,利用Wor2Vec模型对分词后的评论进行词向量转换;最后,将转化得到的句子向量作为分类器的输入,训练分类模型。同时,该文还基于C/S架构设计了相关应用程序,实现了批量数据的提交以及结果的快速获取。在利用Wor2Vec模型对文本进行特征提取的情况下,该文比较了几种常见分类器的性能差异,结果表明多层感知器在几种算法的比较中取得了较好的结果,能对文本情感作出较为准确的分类。
In order to realize the sentiment analysis of batch comments and improve the efficiency to dig the value of comments,we proposed a textual semantic analysis tool based on Spark platform.Firstly,we preprocessed the data and made word segmentation of comments by jieba segmentation to maintain the balance of the different training data.Secondly,with the Word2Vec model,we transformed the segmented comments into vectors that were used as input to train classifiers.We also designed related applications based on C/S architecture to submit data and acquire results quickly.Finally,we analyzed the performance of different classifiers on Spark and found that the multilayer perceptron performs best among the three classification algorithms and can judge the polarity of emotions.
作者
王磊
曾诚
奚雪峰
皮洲
顾建伟
卓文婕
陈帅天
WANG Lei;ZENG Cheng;XI Xuefeng;PI Zhou;GU Jianwei;ZHUO Wenjie;CHEN Shuaitian(School of Electronic and Information Engineering,SUST,Suzhou 215009,China;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou 215009,China;Kunshan Public Security Bureau Command Center,Suzhou 215300,China)
出处
《苏州科技大学学报(自然科学版)》
CAS
2018年第1期71-75,共5页
Journal of Suzhou University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金项目(61472264
61472267
61673290)
苏州市科技发展计划(重点实验室SZS201609)
苏州市科技发展计划(产业前瞻性项目SYG201707)
江苏省研究生实践创新计划项目(SJCX17_0681)
2017年江苏省大学生创新创业训练计划资助项目