期刊文献+

基于标记树的WEB页面净化技术研究 被引量:3

Web Page Distillation Based on the Tag Tree
下载PDF
导出
摘要 根据Web页面标记建立标记树,通过分析,保留有用信息的标记子树,达到获取页面主要内容,净化页面的效果. It's the key problem that how to get the information people need of the internet through the computer. An arithmetic is put forward to solve this problem. At first a tag tree of the web page is constructed, then the authors divide the web page into several parts as Main part, Site flag, Navigation bar, Communication part, Copyrights, and the tag tree tells the relationship of these parts. The authors can parse the tag tree, get the child tag tree that only tells the Main part. So the main part is obtained and the web page is distilled.
作者 李明 张为群
出处 《西南师范大学学报(自然科学版)》 CAS CSCD 北大核心 2006年第5期128-131,共4页 Journal of Southwest China Normal University(Natural Science Edition)
关键词 标记树 标记树模式 页面净化 tag tree tag tree model web page distillation
  • 相关文献

参考文献4

  • 1DOM Interest Group. Document Object Model (DOM) [EB/OL]. http: //www. w3. org/DOM/, 2006-06-12.
  • 2Valter Crescenzi, Giansalvatore Mecca. Automatic Information Extraction From Large Websites [J]. Journal of the ACM, 2004, 51(5): 731-779.
  • 3Valter Crescenzi, Paolo Merialdo, Paolo Missier. Fine-Grain Web Site Structure Discovery [A]. Proceedings of the Fifth ACM International Workshop on Web Information and Data Management [C]. New Orleans: ACM Press, 2003. 382-397.
  • 4Chen J L, Zhou B Y, Shi J, et al. Function Based Object Model Towards Website [A]. Hong Kong: 10 th International World Wide Web Conference [C]. 2001. 587-596.

同被引文献22

  • 1常育红,姜哲,朱小燕.基于标记树表示方法的页面结构分析[J].计算机工程与应用,2004,40(16):129-132. 被引量:24
  • 2林科锵,左志宏,林琳.Web表格信息抽取的研究[J].通讯和计算机(中英文版),2005,2(8):27-31. 被引量:1
  • 3刘杰,束博.一种高效的HTML/XHTML至WML的转换方法[J].北京工商大学学报(自然科学版),2006,24(6):45-48. 被引量:2
  • 4张瑞,李石君.网上表格数据到XML的自动转换[J].计算机工程与应用,2007,43(2):190-192. 被引量:5
  • 5[9]The Apache Software Foundation.Cocoon[CP/OL].http://cocoon.apache.org/,2007-12-14.
  • 6[10]刘圳,孟祥武.一种基于Cocoon的Web应用解决方案[EB/OL].http://www.paper.edu.cn/paper.php?serial_number=200704-211,2007-04-09.
  • 7[11]Philippe LeHégaret,Ray Whitmer,Lauren Wood.Document Object Model (DOM)[EB/OL].http://www.w3c.org/DOM,2005-1-19.
  • 8[5]Laurent Bouillon,Jean Vanderdonckt,Jacob Eisenstein.Model-Based Approaches to Reengineering Web Pages[C] //Proceedings of International Workshop on Task Model and Diagrams for user interface design TAMODIA'2002.Bucharest:INFOREC Publishing House Bucharest,2002:86-95.
  • 9[6]Guido Menkhaus,Sebastian Fischmeister.Dialog Model Clustering for User Interface Adaptation[C] //Proceedings of ICWE 03.Oviedo:Springer Verlag,2003:194-203.
  • 10[7]Jean Vanderdonckt,Laurent Bouillon,Nathalie Souchon.Flexible Reverse Engineering of Web Pages with VAQUITA[C] //Proceedings of IEEE 8th Working Conference on Reverse Engineering WCRE'2001.Suttgart:IEEE Computer Society Press,2001:241-248.

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部