摘要
基于社交理论的短文本情感分析是文本情感分析的重要课题之一。目前短文本的情感研究只发现了用户之间简单的朋友关系,未能对用户之间情感的传播性进行更为深入的挖掘。针对上述问题,构建了一种新的情感计分方法(statistics emotional lexicon method,SELM),按照用户粉丝数量的多少,将用户分为明星用户和普通用户,并结合当前用户关注的其他用户数量与该用户粉丝数量的比值,计算出一个社交关系影响分数,用该影响分数和SentiWordNet情感词典一起为推特短文本计算情感得分。同时,改进处理噪声和短文本的社会学方法(sociological approach to handling noisy and short texts,SANT),提出增强型SANT(ESANT)模型。与SANT不同的是,在对"信息-信息关系"建模时,增强了用户之间的社交关系,以表示更为深层次的情感传播性。在训练ESANT模型过程中,采用合成少数类的过抽样技术(synthetic minority oversampling technique,SMOTE)解决实验数据集上类别失衡问题。最后,使用SELM计分方法将数据集进行划分,重新训练ESANT模型。实验证明,结合SELM计分方法和ESANT模型能提升情感分类的效果。
Short text sentiment analysis based on social theory is one of the important topics in text sentiment analysis.At present,the emotional research of short text generally only finds a simple friend relationship between users,but fails to dig deeper into the emotional transmission between users.In response to the above problems,a new emotional scoring method(statistics emotional lexicon method,SELM)is constructed.The method divides the users into stars or ordinary users according to the number of their fans,calculates a score affected by social relationships in term of the ratio between the number of other users followed by the users and the number of their fans,and uses the score and SentiWordNet together to compute the emotional score for the tweet short text.At the same time,based on the improvement of sociological approach to processing noisy and short texts(SANT),an enhanced SANT(ESANT)model is proposed.Unlike SANT,in the use of ESANT for the modeling of"information-information relationships",social relationships between users are enhanced to express deeper emotional communication.During the training of ESANT model,the synthetic minority oversampling technique(SMOTE)is used to address the category imbalances in the experimental dataset.Finally,the ESANT model is retrained by dividing the dataset using the SELM scoring method.Experiments show that the combination of SELM and ESANT model can improve the performance of sentiment classification.
作者
刘树栋
王磊
武璟珑
徐亮
LIU Shudong;WANG Lei;WU Jinglong;XU Liang(Centre for Artificial Intelligenee and Applied Research,Zhongnan University of Econonlies and Law,Wuhan 430073,China;School of Information and Security Engineering,Zhongnan University of Economics and Law,Wuhan 430073,China;Information Centre,China Electronic Information Industry Group Co.Ltd.,Beijing 100190,China;Business Growth Department,JD,Beijing 100176,China)
出处
《武汉大学学报(工学版)》
CAS
CSCD
北大核心
2020年第9期838-846,共9页
Engineering Journal of Wuhan University
基金
国家自然科学基金项目(编号:61602518,71872180)
中央高校基本科研业务费专项资金项目(编号:2722019JCG074,2722019JCT035)。
关键词
情感分析
情感传播性
情感计分方法
社会关系
短文本
sentiment analysis
sentimental transmission
sentiment scoring method
social relationships
short text