摘要
当前中文微博情感分析的主流做法是将情感极性分类结果的好坏作为评判的标准。从提高微博情感判别准确度的目的出发,尽量多考虑影响微博情感的元素。在统计微博中情感词的基础上,加入了微博表情这一重要元素,采用与文本情感值加权的方式参与微博情感计算,使得对含有表情的微博情感判定结果有了一定程度的提高;在语义规则部分,基本涵盖了汉语中最常用的几种句型规则和句间关系规则,使得对复杂语句的情感分析更加准确。同时,还对每条微博的情感给出了具体的数值,并在正确率、召回率、F值的基础上,提出合格率这一指标,对微博情感判别方法得到的数值准确性进行评价。通过搭建Hadoop平台对测试集的1万条数据进行测试,验证了融合算法的有效性。
Current Chinese microblog sentiment analyses usually use emotional polarity classification result as evaluation standard. To improve the accuracy of the result, this paper considers the elements which may have influence on micro-blog sentiment as much as possible. On the basis of microblogging emotional words, emoticon information is additionally considered for weighted processing, improving the emotional polarity classification result of microblogs which contain emoticons.Then semantic rules, including several common sentence rules and sentence relationship rules, are covered to make a better result of sentimental analyses of complex statements. Meanwhile, we calculate the score of each blog, which is judged by qualification rate. Finally, through Hadoop platform, 10 000 sets of data were tested and verified the validity of the fusion algorithm.
作者
赵天奇
姚海鹏
方超
张俊东
张培颖
ZHAO Tianqi;YAO Haipeng;FANG Chao;ZHANG Jundong;ZHANG Peiying(State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, P. R. China)
出处
《重庆邮电大学学报(自然科学版)》
CSCD
北大核心
2016年第4期503-510,共8页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金
国家自然科学基金(61471056)
江苏省科技计划项目(BY2013095-3-1
BY2013095-3-03)~~
关键词
微博
情感分析
语义规则
微博表情
microblog
sentiment analysis
semantic rules
emoticon