期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
PathMarker:protecting web contents against inside crawlers
1
作者 Shengye Wan Yue Li Kun Sun 《Cybersecurity》 CSCD 2019年第1期100-116,共17页
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanis... Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp). 展开更多
关键词 Anti-crawler mechanism Stealthy distributed inside crawler Confidential Website content protection
原文传递
PathMarker:protecting web contents against inside crawlers
2
作者 Shengye Wan Yue Li Kun Sun 《Cybersecurity》 2018年第1期375-391,共17页
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanis... Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp). 展开更多
关键词 Anti-crawler mechanism Stealthy distributed inside crawler Confidential Website content protection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部