期刊文献+

一种新的HTML页面清洗压缩算法 被引量:1

下载PDF
导出
摘要 本文提出了一种新的适用于Web信息抽取的HTML页面清洗压缩算法。该算法充分利用了HTML页面树中各标签的相对位置信息。实验表明,该算法能够有效地处理页面中的语法错误,并实现对页面冗余数据的压缩,具有良好的实用价值和应用前景。
作者 任仲晟
出处 《福建电脑》 2009年第1期60-61,共2页 Journal of Fujian Computer
  • 相关文献

参考文献1

共引文献4

同被引文献7

  • 1Sandip Debnath, Prasenjit Mitra, Nirmal Pal, et al. Automatic identification of informative sections of web pages[J]. IEEE transactions on knowledge and data engineering, 2005, 17 (9): 1233-1246.
  • 2Jiying Wang, Fred H Lochovsky. Data-rich section extraction from HTML pages[C]//Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE'02), 2002: 313-322.
  • 3Lan Yi, Bing Liu, Xiaoli Li. Eliminating noisy information in web pages for data mining[C]//Proc Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003: 296-305.
  • 4Lan Yi, Bing Liu. Web Page Cleaning for web mining through feature weighting [C]//Proeeedings of Eighteenth International Joint Conference on Artifieial Intelligenee. Aeapulco, Mexico, 2003 : 9- 15.
  • 5Ji He, Ah-Hwee Tan, Chew-Lim Tan, et al. On quantitative evaluation of clustering systems [J]. Information Retriveal And Clustering, 2002:105-134.
  • 6Wuu Yang. Identifying syntactic differences between two programs [J]. Software-practice and Experience, 1991,21 (7): 739-755.
  • 7Raghavan V V,Wang G S,Bollmann P. A critical investigation of recall and precision as measures of retrieval system performance [J]. ACM Trans Information Systems, 1989 (3): 205-229.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部