摘要
近几年,垃圾博客过滤成为国际上新的热点研究领域。现有的过滤算法大多基于词频特征分类,特征冗余并缺乏关联性。为了解决此问题,提出一种基于组合特征的动态垃圾博客过滤算法(CFDSD),该算法采用作者属性和自相似特征来解决特征冗余和关联性低的问题,并应用贝叶斯分类算法优化词频特征分类。实验表明,该算法能适应博客随时间变化而动态更新的特点,同时提高了过滤效率。
Splog filtering has become a new hot area in the international in recent years.Most of the traditional filtering algorithms are based on word frequency feature classification,which is quite redundancy and lack of relevance.Accor-ding to this problem,a dynamic filtering algorithm based on the combination of features for splog(CFDSD) was proposed to solve the problem of low relevance and redundancy.The CFDSD algorithm uses self-similarity feathers and the attributes of author,at the same time adopts the Bayesian classification algorithm to optimize word frequency feature classification.Experiments show that the algorithm is adaptable to dynamical updated features of the blog with time changes,and improves filtering efficiency,while reducing the time to filter splog.
出处
《计算机科学》
CSCD
北大核心
2012年第5期177-179,212,共4页
Computer Science
基金
国家自然科学基金项目(60603047)
教育部留学回国人员科研启动基金资助项目
辽宁省科技计划项目(2008216014)
辽宁省教育厅高等学校科研基金(L2010229)
大连市优秀青年科技人才基金(2008J23JH026)资助
关键词
垃圾博客过滤
词频特征
自相似特征
组合特征
贝叶斯分类
Splog filtering
Term frequency features
Self-similarity features
Combined features
Bayesian classification