摘要
受潜在的商业利益的驱动,微博水军横行于话题与评论之间,对人们了解真实的结果产生不良影响,成为正常用户了解事实真相的障碍。分析了正常用户和水军的关系图,以此为切入点,分析了水军的特点,从用户属性中抽取了8个特征数据(粉丝数、关注数、好友粉丝比、注册时间、活跃度、关注速率、双向关注比和互粉数)基于学习数据集R训练逻辑回归分类模型,得到可靠的回归系数后,使用识别样本集R进行识别,水军识别率高达98.770%。为验证抽取的8个特征是否能有效识别水军,使用Scikit-Learn机器学习库中4种分类方法对同一识别样本集进行水军识别,水军识别准确率均在98.688%以上。研究结果表明,选取的8个特征能有效地进行水军判别,逻辑回归分类模型在进行水军识别研究中具有高准确性和可靠性。
Drived by the potential commercial benefits, mic ro -b lo g ’s p u b l ic o p in io n viruse s ram p a n t between to p ic s andnot only have a bad influence on understanding the real result for people,but also have become an obstacle to normal users to explore the truth. This paper analyzed the normal users and public opinion viruses diagram,as the starting point forthe study,after analysing the characteristics of the public opinion ,8 characters ( number of fans,number of friends,the number of mutual concern, re g is te r t im e, a c t iv i t y,a t te n t io n rateand the rate of fans and friends) were extracted. After obtaining reliable regression coefficients by classification trainingel based on a learning data set,the public opinion viruses recognition rate was as high as 98. 770% to verify that the 8 features could identify the public opinion viruses effectively,4 kinds of classification methods in Scikit-Learn machine repos-itory were used to recognize public opinion viruses in the same sample setl and the recognition show that the logistic regression model has high accuracy and reliability in the recognition ofability in the recognition
出处
《微型机与应用》
2017年第16期67-69,72,共4页
Microcomputer & Its Applications
基金
南京农业大学中央高校基本科研业务费人文社会科学研究基金项目(SK2015023)
国家社会科学基金项目(13CTQ031)
关键词
微博
水军
逻辑回归
micro-blog
public opinion viruses
logistic regression