期刊文献+

基于DOM和显示属性的网页信息除噪技术研究 被引量:1

The method of noise elimination in web information based on DOM tree and display attribute
下载PDF
导出
摘要 提出了一种基于文档对象模型(DOM)和网页显示属性的信息除噪方法.通过对网页内容进行结构和特点分析,把一个网页信息内容划分为信息块和噪声块两个部分,利用解析器把网页转化成DOM模型并对网页信息噪声进行判断,根据网页的显示属性对DOM模型进行简化,最终实现对DOM模型噪声信息的有效去除. This paper introduces a method of noise elimination that based on the DOM and web vision attribute.By analysing the structure and characters of web information,web page could be separated into two categories: Valuable segments and Noise segments.This paper gets DOM with Parser,identifies web page noise,reduces DOM based on vision attribute and then eliminates web page noise.
作者 付涛
出处 《商丘师范学院学报》 CAS 2010年第9期90-93,共4页 Journal of Shangqiu Normal University
关键词 DOM 显示属性 噪声 解析器 DOM display attribute noise DOM parser
  • 相关文献

参考文献6

二级参考文献35

  • 1Shian-Hua Lin, Jan-Ming Ho. Discovering informative content blocks from Web documents. In: SIGKDD, 2002
  • 2Soumen Chakrabarti, Mukul M. Joshi and Vivek B. Tawde.Enhanced topic distillation using text, markup tags, and hyperlinks. In: SIGIR, 2001
  • 3S. Chakrabarti, M. Joshi, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In :WWW, Hawaii. ACM, 2002
  • 4Yiming Yang. Noise reduction in a statistical approach to text categorization. In: Proceedings of SIGIR-95, 18th ACM International Conference on Research and Development in Information Retrieval, 1995
  • 5Li Xiaoli and Shi Zhongzhi. Innovating Web page classification through reducing noise. Journal of Computer Science & Technology, 2002 ,17(1): 9 ~ 17
  • 6http://162. 105.80.84/cgi-bin/getdirectory? ccode = 0
  • 7http://e. pku. edu. cn
  • 8Yang Y. Expert network:effective and efficient learning from human decisions in text categorization and retrieval. In: Proceedings of the Seventeenth International ACM SIGIR Conference on Research and Development in Information Retrieval,1994. 13 ~ 22
  • 9Lewis D. D., et al. Training algorithms for linear text classitiers. In: Proceedings of the Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996. 298 ~ 306
  • 10Michael W. Berry, Murray Browne. Understand Search Engines (Mathematical Modeling and Text Retrieval). SLAM,1999

共引文献153

同被引文献9

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部