期刊文献+

页面包装器自动生成的改进算法 被引量:3

Improvement on the Algorithm for the Automatic Page Wrapper Gen eration
下载PDF
导出
摘要 论文提出了一种页面包装器自动生成的改进算法,在对两个HTML页面进行匹配生成页面包装器的过程中,该算法使用树型数据模型作为基础,比原算法具有更高的执行效率。 This paper presents an improved algorithm for the automatic generation of page wrapper.In the process of generating the wrapper from tow HTML pages,this algorithm uses the tree data model,and can run more efficiently.
出处 《计算机工程与应用》 CSCD 北大核心 2004年第22期113-115,122,共4页 Computer Engineering and Applications
关键词 WEB数据抽取 包装器 匹配算法 算法优化 Web data extraction,wrapper,match algorithm,algorithm optimization
  • 相关文献

参考文献6

  • 1V Crescenzi,G Mecca,P Merialdo. RoadRunner-Towards Automatic Data Extraction from Large Web Sites[C].In:Proceedings of the 26th International Conference on Very Large Data Bases
  • 2Alberto H F Laender,Berthier A Nebeiro Neto et al.A Brief Survey of Web Data Extraction Tools[J].ACM,2002;31(2)
  • 3Joachim Hammer,Jason McHugh,Hector Garcia-Molina. Semistructured Data:The TSIMMIS Experience[C].In:Proceedings of the First East-European Syposium on Advances in Databases and Information Systems (ADBIS97), 1997:1 ~8
  • 4黄豫清,戚广志,张福炎.从WEB文档中构造半结构化信息的抽取器[J].软件学报,2000,11(1):73-78. 被引量:47
  • 5J McHugh,S Abiteboul,R Goldman et al. Lore:A Database Management System for Semistructured Data[J].ACM SIGMOD, 1997; 26 (3):54~66
  • 6http://www.w3.org/People/Raggett/tidy

二级参考文献1

  • 1Ham mar J,SIGMOD Record,1997年,26卷,2期,18页

共引文献46

同被引文献14

  • 1张志刚,陈静,李晓明.一种HTML网页净化方法[J].情报学报,2004,23(4):387-393. 被引量:57
  • 2常育红,姜哲,朱小燕.基于标记树表示方法的页面结构分析[J].计算机工程与应用,2004,40(16):129-132. 被引量:24
  • 3王茹,宋瀚涛,陆玉昌.网页数据自动抽取系统[J].计算机工程与应用,2004,40(19):135-138. 被引量:8
  • 4NEWSBOT[EB/OL].http://newsbot.msnbc.msn.com/about.aspx,2005.
  • 5CRESCENZI V,MECCA G.RoadRunner:Towards Automatic Data Extraction from large Web Sites[A].Proceedings of the 27th VLDB Conference[C].2001.
  • 6REIS DC,GOLGHER PB,SILVA AS,et al.Automatic Web news extraction using tree edit distance[A].Proceedings of the 13th International Conference on WWW2004[C].2004.504-505.
  • 7YI L,LIU B,LI XL.Eliminating Noisy Information in Web Pages for Data Mining[A].Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining[C].Washington,DC,USA,2003.296-305
  • 8LIN SH,HO JM.Discovering Informative Content Blocks from Web Documents[A].Proceedings of ACM.SIGKDD'02[C].2002.
  • 9YANG W.Identifying syntactic differences between two programs[J].Software-Practice and Experience,1991,21 (7):739-755.
  • 10王中锋,王志海.基于条件对数似然函数导数的贝叶斯网络分类器优化算法[J].计算机学报,2012,35(2):364-374. 被引量:19

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部