期刊文献+

基于Web的智能信息采集及处理系统设计与实现 被引量:9

Design and Implementation of Intelligent Information Collection and Processing System Based on Web
下载PDF
导出
摘要 互联网信息日益扩展的同时,如何采集和利用Web信息越来越备受关注。该文设计和实现的基于Web的智能信息采集及处理系统,采用高效的URL去重和基于模版的下载机制,提高了采集Web资源的性能;应用自然语言处理技术,对采集信息做智能分类和摘要,在发布上突出个性化的信息服务。与同类系统相比,智能性、实用性都显示出了明显的优势。 With the rapid development of Internet, collecting and exploiting Web information is extensively addressed. This paper designs and realizes one intelligent system on Web information collection and processing. On one hand, thanks to unrepreated URLs and template-based downloading, the collection performance is improved greatly. On the other hand, mature and advanced natural language processing techniques are used for classifying and abstracting the collected information. Thus, the personalization is highlighted. Experimental results show that the proposed system outperforms related work greatly.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第18期265-267,共3页 Computer Engineering
基金 教育部科技基金资助重点项目(教技司[2000175) 北京市自然科学基金资助项目(4022008)
关键词 Web采集 URL去重 智能信息处理 个性化发布 Web collection unrepreated URL intelligent information processing personal issue
  • 相关文献

参考文献6

二级参考文献16

  • 1黄纯敏 吴郁莹.网络中文文件自动摘要[Z].http://www.mis.yuntech.edu.tw/~huangcm/ ublication/TANet073.pdf,.
  • 2Neto J L,Freitas A A.Kaestner C A A.Automatic Text Summarization Using a Machine Learing Approach[Z].http://www.cs.kent.ac.uk /people /staff/aaf/pub_papers.dir/SBIA-2002-Joel.pdf.
  • 3Radev D,Micheal A W.Topper Multi Document Centroid-based Text Summarization[C].Proceeding of the ACL-02 Demonstrations Session,Philadelphia,20002-07:112-113.
  • 4Mladenic D. Machine Learning for better Web Browsing[J]. AAAI 2000 Spring Symposium Technical Reports on Adaptive User Interfaces. Menlo Park, CA: AAAI Press,2000: 82-84.
  • 5Embley D W, Jiang Y, Ng Y-K. Record-Boundary Discovery in Web Documents. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data,Philadelphia PA, 1999: 467-478.
  • 6Shipeng Yu, Deng Cai, Ji-Rong Wen, Wei-Ying Ma. Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation [J]. WWW 2003,11-18.
  • 7Lin S-H, Ho J-M. Discovering Informative Content Blocks from Web Documents[J]. In Proceedings of ACM SIGKDD,2002.
  • 8Chen J, Zhou B, Shi J, et al. Function-Based Object Model Towards Website Adaptation [J]. In Proceedings of the 10th International World Wide Web Conference, 2001.
  • 9J Kupiec. J Pedersen et al. A trainable document summarizer. In: Proc of the 18th Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR'95). Seattle, Washington, USA: ACM Press, 1995. 68~73
  • 10R Brandow, K Mitze, L F Rau. Automatic condensation of electronic publication by sentence selection. Information Processing and Management, 1995, 34(5): 575~685

共引文献62

同被引文献40

引证文献9

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部