摘要
随着诸如twitter和微博等新媒体的发展,由于网络公关与营销等原因,网络水军也出现并呈现出急剧增加的态势。造成大量的网络资源和普通用户的时间遭到侵占,同时也对舆情真实性产生了重要影响。文章建立一种基于逻辑回归算法的水军识别模型,,利用累计分布函数(CDF)对对新浪微博用户行为属性以及账号属性进行分析和选取,将合适的属性包括好友数、粉丝数、文本相似度、URL率等作为输入参数,用以训练基于逻辑回归算法的分类模型,得到相应系数,从而完成对网络水军识别模型的构建。实验结果证明了模型的准确性和有效性。
With the appearance of the new media like twit er and Weibo, the number of spammer has increased sharply, which makes the network resource and the time of non-spammer has been largely occupied. This phenomenon has also produced a huge impact on the authenticity of the network environment. In this paper, the at ributes of Sina Weibo’s user behaviors and account have been col ected and preprocessed in order to establish data set in the experiment. Analyzing the features of the CDF (cumulative distribution function), appropriate at ributes such as the number of friends, the numbers of fans, text similarity, and URL rate have been selected as input parameters for logistic regression model. Using the logistic regression model, we could get the corresponding coef icient, thus completing the construction of detection model about spammer. Experimental results could demonstrate the accuracy and feasibility of detection model.
基金
国家973项目(No.2013CB329604)
国家自然科学基金项目(No.61472433)资助