摘要
网站内容拨测系统通过采集网页内容,分析网页内的网址、文本、图片、音视频等内容,及时发现网站/网页中存在的不良内容,同时通过对网站标题关键字匹配、网站备案信息比对和网站协议类型分析等方法对网站进行分类。拨测系统对采集的网页内容和用户访问行为的分析,可为互联网行业发展趋势分析和预测提供数据支撑,为企业的精细化运营提供强有力的技术手段。
The primary method of Detective and Analytical Systems for the contents of websites is used to check URL text, picture and multimedia information to discovery the illegal or unhealthy content promptly,meanwhile labeling the website through matching contents between websites title and keyword, checking register information and analyzing network protocol. The final data collected and analyzed by this system also can be used for predicting the development of the web services and providing the guidance for the operation of related enterprises.
出处
《现代电信科技》
2010年第5期60-65,共6页
Modern Science & Technology of Telecommunications
关键词
爬虫
软探针
旁路
网关过滤
网站分类
web crawler, soft probe, bypass, gateway flitter, website classification