摘要
针对当前大多微博情绪分析算法难以准确描绘不同情绪差异的问题,对中文微博的情绪成分和层次化情绪分类进行研究。预处理消除非情绪信息,引入ICTCLAS分词工具包对文章进行分割,提取形容词、名词和动词等,形成特征,使用卡方测试、词频和点互信息(PMI)对特征进行选择,运用支持向量回归(SVR)和规则集进行分类。数据集采用新浪原始中文微博,不同分组之间的实验结果验证了该方法的有效性,其在多个层次上的F测度等值优于其它同类方法,随机挑选50篇微博进行评判,近一半的结果得到所有评判员的支持。
Aiming at the problem that lots of current micro-blog sentiment analysis algorithms are difficult to accurately depict the different emotional differences,a study on hierarchical sentiment classification and emotional components of Chinese micro-blog articles was researched.Non-emotional information was eliminated in the pre-processing.The ICTCLAS word segmentation toolkit was introduced to segment the text,to extract adjectives,nouns and verbs,and form features.The x 2-test,word frequency and point of mutual information(PMI)were adopted to select features.Support vector regression(SVR)and rule sets were used for classification.In the experiment,Sina original Chinese micro-blog was used as data sets.The effectiveness of the proposed method is verified by the results of different groups.Compared with other similar methods,the proposed method is more accurate at multiple levels in the aspects of F measure.50 micro-blog are randomly selected to judge.Nearly half of the results are supported by all the judges.
作者
王向华
宋欣
WANG Xiang-hua;SONG Xin(College of Electronic Information Engineering,Tianjin Vocational Institute,Tianjin 300410,China)
出处
《计算机工程与设计》
北大核心
2018年第11期3431-3437,共7页
Computer Engineering and Design
基金
天津市基础研究计划基金项目(14JCTPJC00553)
天津市高等学校科技发展基金计划基金项目(20130711)
关键词
微博
情绪分类
点互信息
情绪成分分析
支持向量回归
micro-blog
emotion classification
point of mutual information
emotional components analysis
support vector regression