期刊文献+

基于访问日志的网页内容监控挖掘系统

Webpage Content Monitoring and Mining System Based on Visiting Log
下载PDF
导出
摘要 URL是用于完整描述Internet上网页和其他资源地址的一种标识方法,URL访问日志能记录用户的上网痕迹。针对该特点,提出一种基于访问日志的网页内容监控挖掘系统,实现网页内容抓取、监控、分析、报表生成等一系列过程的自动化。系统运行测试结果表明,该系统的准确率较高,能有效解决运营商和互联网监管部门的网络监管问题。 URL is the global address of documents and other resources in Internet. For the function that URL visiting logs record the traces of users on Internet, the paper discusses key techniques of Web monitoring and mining system based on users visiting log. This system can automatically grasp webpage, monitor, analyze contents and generate tables. Test results show that the system has high accuracy rate and it can satisfy design demands and effectively settle the network supervision problems for Internet operators and government supervision departments.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第4期70-72,共3页 Computer Engineering
基金 国家自然科学基金与中国民用航空总局联合基金资助项目(60776816) 广东省自然科学基金资助重点项目(8251064101000005)
关键词 用户访问日志 网页内容挖掘 网页分类 user visiting log webpage content mining webpage classification
  • 相关文献

参考文献6

二级参考文献13

  • 1Salton G,Lesk M E.Computer Evaluation of Index and Text Processing. Association for Computing Machinery,1968,15(1).
  • 2Maron M E. On Relevance,Probabilistic Indexing and Information Retrieval. Journal of the ACM,1960,7(3).
  • 3Lewis D D. Feature Selection and Feature Extraction for Text Categorization. In Proceedings of Speech and Natural Language Workshop. Defense Advanced Research Projects Agency,Morgan Kaufmann,1992-02:212-217.
  • 4Yang Yiming,Liu Xin. A Re-examination of Text Categorization Methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR),1999:42-49.
  • 5Junghoo Cho.CRAWLING THE WEB:DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA[D].Ph D Dissertation.2001
  • 6Steve Lawrence,C Lee Giles.Searching the World Wide Web[J].Science,1998; 280 (5360)
  • 7Information Extraction:A Multidisciplinary Approach to an Emerging Information Technology[C].In:Pazienza,Maria Teresa Pazienza eds.volume 1299 of Lecture Notes in Artificial Intelligence,Springer,International Summer School,SCIE-97,Frascati,Italy,1997
  • 8N Kushmerick.Cleaning the web[J].IEEE Intelligent System,1999;14(2):20~22
  • 9S Soderland.Learning information extraction rules for semi-structured and free text[J].Machine Learning,1999;34:233~272
  • 10D Freigat.Information extraction from html:application of a general learning approach[C].In:proceedings of the fifteenth conference on artifical intelligence AAAI-98,1998:517~523

共引文献128

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部