摘要
通过扩充情感词典词基数,新建中立词词典,引入网络流行词等方式丰富情感词典,提高分词后情感词匹配的准确性;以某评价类网站网民评论作为原始数据进行分词,提取相应的正向情感分数,负向情感分数,中立情感词个数,评论情感总分值等特征,通过对连续数据的规约提炼离散属性,按照信息增益最大原则生成决策树进行评论的情感分类,去除小概率节点后进行两次实验,对好评的识别率达到90%,对差评的识别率达到92%。对中评的识别率达到75%。
Enriches the emotional dictionary and improves the accuracy of matehing emotional words after word segmentalinn by expanding the emo- tional dictionary word base, buihling neutral dictionary and leading into network buzzwords. On the other hand, Internet user's reviews of evaluation website are used as the original data. After extracting the amount of text features, such as positive emotion scores, negative emo- tion scores, neutra] emotional words, and the tnlal score of emotion Comment, gets the elassifieatinn of Comments on the emotion through re- fining the discrete attributes for eontinuous data specification and generates a decision tree according to the maximum gain of infnrmation. Two experiments are performed after removing the small probability nodes, the recognition rate of praise and bad review reach 90% and 92%.The recognition rate of the medinm evaluatinn reaches 75%.
作者
刘钢
张维石
LIU Gang ZHANG Wei-shi(Dalian Maritime University, Dalian 116026)
基金
中央高校基本科研业务费专项资助(Grant No.3132016308)
supported by the Science and Technology Funds of Dalian(Grant No.2015A11GX010)~~
关键词
情感词典
特征
信息增益
决策树
Emotional Dictionary, Feature: Information Gain: Decision Tree