期刊文献+

正则表达式在网络信息监控分析系统中的应用

Application of regular expression in Web information monitoring and analysis system
下载PDF
导出
摘要 在网络信息监控系统中,利用正则表达式和HTMLparser对网页HTML代码进行递规匹配,实现了对网站的整体解析。实际应用表明,新信息从发布到抓取的时间小于5分钟,没有出现信息漏抓、不抓和重复抓取的现象。系统利用Java语言实现,准确率和遗漏率分别达到99%和0。 In web information monitoring system, a comprehensive analysis of a web site is realized by using regular expression and HTMLparser in terms of recursive matching. The actual testing shows a time efficiency of less than five minutes between news is published and scratched. Scratching omission and repetitive scratching never happens in the analysis process. The system is built in Java language and reaches a precision of 98% and omission ratio of 0.
出处 《信息技术》 2008年第4期33-34,共2页 Information Technology
基金 上海市某公司项目资助
关键词 正则表达式 网络监控 信息抓取 regular expression web monitoring information scratching
  • 相关文献

参考文献6

二级参考文献42

  • 1[2]赵江华,闫宏飞,王建勇等. 天网中的并行与分布处理. 北京大学,技术报告:PKU CS NET TR2002001, 2002. Http://162.105.80.88/crazysite/home/report(Zhao Jianghua, Yan Hongfei, Wang Jianyong et al. Parallel and distributed processing in WebGather(in Chinese). Peking University, Tech Rep: PKU CS NET TR2002001, 2002.Http://162.105.80.88/crazysite/home/report)
  • 2[3]Yan Hongfei, Wang Jianyong, Li Xiaoming. A dynamically reconfigurable model for a distributed web crawling system. In: 2001 Int'l Conf Computer Networks and Mobile Computing. Beijing, 2001. 157~162
  • 3[4]Marc Najork, Janet L Wiener. Breadth-first search crawling yields high-quality pages. In: Proc of the 10th Int'l World Wide Web Conf. Hongkong, 2001. 114~118
  • 4[5]Li Xiaoming, Wang Jianyong. WebGather: Towards quality and scalability of a web search service. In: Proc of the 10th Int'l World-Wide Web Conf. Hongkong, 2001
  • 5[7]中国互联网络信息中心(CNNIC). 信息服务. 2000. http://www.nic.edu.cn/INFO/cindex.html(CNNIC. Information service(in Chinese), 2000. http://www.nic.edu.cn/INFO/cindex.html)
  • 6[9]Andrei Broder, Ravi Kumar, Farzin Maghoul et al. Graph structure in the web: Experiments and models. In: Proc of the 9th Int'l World-Wide Web Conf. Amsterdam, 2000. 309~320
  • 7[10]Reka Albert, Hawoong Jeong, Albert-Laszlo Barabasi. Internet: Diameter of the world-wide web. Nature, 1999, 401: 130~131
  • 8[11]S R Kumar, P Raghavan, S Rajagopalan et al. Trawling the Web for emerging cyber-communities. In Proc of the 8th Int'l World-Wide Web Conf. Toronto, Canada, 1999. http://www8.org/w8-papers/4a-search-mining/trawling/trawling.html
  • 9[12]J Kleinberg. Authoritative sources in a hyperlinked environment. In: Proc of 9th ACM-SIAM Symp on Discrete Algorithms, 1998. Extended version in Journal of the ACM 1999, 46(5): 604~632
  • 10[16]Hobbs J,Appelt D,Bear J et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Roche,Schabes eds. Finite State Devices for Natural Language Processing, MIT Press,Cambridge MA, 1996

共引文献223

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部