期刊文献+

Web信息采集中页面分块技术的研究 被引量:2

下载PDF
导出
摘要 本文提出了一个基于网页分块的信息采集系统模型。在页面分析阶段,引入了改进的基于视觉的页面分块方法,这是一种自顶向下的、标签树独立的方法,其目的是检测出Web内容结构,实验结果令人满意。基于分块后的页面,利用一个经验交互式的噪音去除算法Page_CN,定义页面的噪音区间,去除噪音区域,得到更加明确的主题区域。
作者 徐薇
出处 《武汉科技学院学报》 2007年第5期43-45,共3页 Journal of Wuhan Institute of Science and Technology
  • 相关文献

参考文献4

二级参考文献10

  • 1[1]Lin Shian-hua, Ho Jan-ming. Discovering informative content blocks from Web documents [A]. Proceeding of the 8th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining [C]. Edmonton :ACM Press,2002.588 - 593.
  • 2[2]Yi Lan,Liu Bing, Li Xiao-li. Eliminating noisy information in Web pages for data mining [A]. Proceeding of the 8th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining [C]. Washington, DC: ACM Press ,2003. 296 - 305.
  • 3[3]Kovacevic Milos, Dilligenti Michelangelo, Gori Marco,et al. Recognition of common areas in a Web page using a visualization approach [A]. Proceeding of the 10th International Conference on Artificial Intelligence: Methodology, Systems, Applications [C]. Varna: Springer,2002.203 - 212.
  • 4[4]Gupta Suhit, Kaiser Gail E, Neistadt David. et al. DOMbased content extraction of HTML documents [A].Proce-eding of the 12th International World Wide Web Conference [C]. Budapest: ACM Press ,2003. 207 - 214.
  • 5[5]Cai Deng, Yu Shi-peng, Wen Ji-rong, et al. Extracting content structure for Web pages Based on visual representation [A]. Proceeding of the 6th Asia Pacific Web Conference [C]. Xian: Springer,2003. 406 - 417.
  • 6Mladenic D. Machine Learning for better Web Browsing[J]. AAAI 2000 Spring Symposium Technical Reports on Adaptive User Interfaces. Menlo Park, CA: AAAI Press,2000: 82-84.
  • 7Embley D W, Jiang Y, Ng Y-K. Record-Boundary Discovery in Web Documents. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data,Philadelphia PA, 1999: 467-478.
  • 8Shipeng Yu, Deng Cai, Ji-Rong Wen, Wei-Ying Ma. Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation [J]. WWW 2003,11-18.
  • 9Lin S-H, Ho J-M. Discovering Informative Content Blocks from Web Documents[J]. In Proceedings of ACM SIGKDD,2002.
  • 10Chen J, Zhou B, Shi J, et al. Function-Based Object Model Towards Website Adaptation [J]. In Proceedings of the 10th International World Wide Web Conference, 2001.

共引文献28

同被引文献8

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部