摘要
互联网搜索引擎排名算法中,外部链接是一个重要因素,而利用链接作弊现象普遍存在于互联网中。暗链是链接作弊其中的一种手段,难以检测和清除,被称为"网络牛皮癣"。为了维护公平的搜索引擎排名机制,保证搜索结果质量,针对暗链这种作弊手段,提出了一种基于机器学习的互联网暗链检测方法,该方法结合网页源码锚文本的特征检测暗链。给出了相关性能分析,在真实的网络环境下的实验验证表明了所提出的方法可行有效。该研究为搜索引擎打击链接隐藏的作弊行为提供了理论和实践支撑。
External link is a critical factor in search engine algorithm, thus link spare is wide spread in Internet. Hidden hy- perlink is one kind of the link spam. It is the "psoriasis" in Internet, and hard to eradicate. In order to strike this cheating behavior and ensure quality of search results, this paper proposd a method to identify Web pages which contain hidden hyper- links based on machine learning, utilizing features of anchor text in HTML code of Web pages. It analyzed the performance of this model, and experiment based on the real Internet environment proves the method propose is effective. This study will pro- vide Search Engines with theoretical and practical support for striking the Web spam cheating.
出处
《计算机应用研究》
CSCD
北大核心
2015年第9期2779-2783,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61375039
61005029)
中国科学院计算机网络信息中心"一三五"规划重点培育方向专项基金资助项目(CNIC_PY_1402)
关键词
暗链
链接隐藏方式
锚文本
机器学习
文本分类
hidden hyperlink
hyperlink hiding techniques
anchor text
machine learning
text classification