摘要
URL是用于完整描述Internet上网页和其他资源地址的一种标识方法,URL访问日志能记录用户的上网痕迹。针对该特点,提出一种基于访问日志的网页内容监控挖掘系统,实现网页内容抓取、监控、分析、报表生成等一系列过程的自动化。系统运行测试结果表明,该系统的准确率较高,能有效解决运营商和互联网监管部门的网络监管问题。
URL is the global address of documents and other resources in Internet. For the function that URL visiting logs record the traces of users on Internet, the paper discusses key techniques of Web monitoring and mining system based on users visiting log. This system can automatically grasp webpage, monitor, analyze contents and generate tables. Test results show that the system has high accuracy rate and it can satisfy design demands and effectively settle the network supervision problems for Internet operators and government supervision departments.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第4期70-72,共3页
Computer Engineering
基金
国家自然科学基金与中国民用航空总局联合基金资助项目(60776816)
广东省自然科学基金资助重点项目(8251064101000005)
关键词
用户访问日志
网页内容挖掘
网页分类
user visiting log
webpage content mining
webpage classification