摘要
针对中文微博垃圾特点,提取基于向量空间模型的中文文本相似度、长短链接相似度、发文时间规律等新的分类特征,加入现有的特征集,运用支持向量机方法,训练后得到分类模型。实验结果表明,该方法是一种有效的垃圾微博识别技术。
To fill Chinese microblog spammer' identifying gap,some new VSM-based features such as Chinese text similarity,long and short URLs similarity,and posting regulations etc.are abstracted and put together with currently feature set,then support vector machine is employed for training and classification model is obtained.The experiment results show that the proposed method is of great effect for spammers' identification.
出处
《安徽工业大学学报(自然科学版)》
CAS
2013年第4期440-445,共6页
Journal of Anhui University of Technology(Natural Science)
基金
国家自然科学基金项目(61003311)
江苏省网络与信息安全重点实验室开放课题基金项目(BM2003201-201006)
关键词
博文特征
用户特征
支持向量机
垃圾微博识别
status feature
user profile feature
support vector machine
microblog spammers' identification