With the great commercial success of several IPTV (internet protocal television) applications, PPLive has received more and more attention from both industry and academia. At present, PPLive system is one of the most ...With the great commercial success of several IPTV (internet protocal television) applications, PPLive has received more and more attention from both industry and academia. At present, PPLive system is one of the most popular instances of IPTV applications which attract a large number of users across the globe; however, the dramatic rise in popularity makes it more likely to become a vulnerable target. The main contribution of this work is twofold. Firstly, a dedicated distributed crawler system was proposed and its crawling performance was analyzed, which was used to evaluate the impact of pollution attack in P2P live streaming system. The measurement results reveal that the crawler system with distributed architecture could capture PPLive overlay snapshots with more efficient way than previous crawlers. To the best of our knowledge, our study work is the first to employ distributed architecture idea to design crawler system and discuss the crawling performance of capturing accurate overlay snapshots for P2P live streaming system. Secondly, a feasible and effective pollution architecture was proposed to deploy content pollution attack in a real-world P2P live streaming system called PPLive, and deeply evaluate the impact of pollution attack from following five aspects:dynamic evolution of participating users, user lifetime characteristics, user connectivity-performance, dynamic evolution of uploading polluted chunks and dynamic evolution of pollution ratio. Specifically, the experiment results show that a single polluter is capable of compromising all the system and its destructiveness is severe.展开更多
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanis...Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).展开更多
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanis...Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).展开更多
基金Project(2007CB311106) supported by National Basic Research Program of ChinaProject(242-2009A82) supported by National Information Security Special Plan Program of China
文摘With the great commercial success of several IPTV (internet protocal television) applications, PPLive has received more and more attention from both industry and academia. At present, PPLive system is one of the most popular instances of IPTV applications which attract a large number of users across the globe; however, the dramatic rise in popularity makes it more likely to become a vulnerable target. The main contribution of this work is twofold. Firstly, a dedicated distributed crawler system was proposed and its crawling performance was analyzed, which was used to evaluate the impact of pollution attack in P2P live streaming system. The measurement results reveal that the crawler system with distributed architecture could capture PPLive overlay snapshots with more efficient way than previous crawlers. To the best of our knowledge, our study work is the first to employ distributed architecture idea to design crawler system and discuss the crawling performance of capturing accurate overlay snapshots for P2P live streaming system. Secondly, a feasible and effective pollution architecture was proposed to deploy content pollution attack in a real-world P2P live streaming system called PPLive, and deeply evaluate the impact of pollution attack from following five aspects:dynamic evolution of participating users, user lifetime characteristics, user connectivity-performance, dynamic evolution of uploading polluted chunks and dynamic evolution of pollution ratio. Specifically, the experiment results show that a single polluter is capable of compromising all the system and its destructiveness is severe.
基金This work is supported by U.S.Office of Naval Research under grants N00014-16-1-3214 and N00014-16-1-3216.
文摘Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).
基金supported by U.S.Office of Naval Research under grants N00014-16-1-3214 and N00014-16-1-3216.
文摘Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).