期刊文献+

Web网页中动态数据区域的识别与抽取 被引量:8

Dynamical Data Regions Identification and Extraction in Web Pages
下载PDF
导出
摘要 采用基于HTML标记树的数据块查找方法挖掘Web网页中的数据区域,在此基础上结合网页聚类和跨网页数据区域匹配自动识别一个网页中的动态数据区域。实验结果表明,该方法能够提高Web网页中动态数据区域识别的召回率和准确率。 This paper presents an improved approach for finding data blocks in the HTML tag tree to mine the data regions embedded in a Web page. A policy of combining the Web page clustering and cross-page data region analysis is proposed to identify the dynamical Web data regions. Experimental results show the effectiveness of given approach.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第11期53-55,58,共4页 Computer Engineering
基金 西安电子科技大学博士生创新基金资助项目(A06047)
关键词 Web数据区域抽取 动态数据区域识别 跨网页分析 Web data regions extraction Dynamical data regions identification Cross-page analysis
  • 相关文献

参考文献5

  • 1陈琼,苏文健.基于网页结构树的Web信息抽取方法[J].计算机工程,2005,31(20):54-55. 被引量:24
  • 2Lin S H,Ho J M.Discovering Informative Content Blocks from Web Documents[C]//Proceedings of the 8^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2002:588-593.
  • 3Valiente G.Tree Edit Distance and Common Subtrees[R].Universitat Politecica de Catalunya,Barcelona,Spain,Research Report LSI-02-20-R,2002.
  • 4Wang J Y,Lochovsky F.Data-rich Section Extraction from HTML Pages[C]//Proceedings of the 3^rd International Conference on Web Information Systems and Engineering.2002:313-322.
  • 5Zhai Y,Liu B.Web Data Extraction Based on Partial Tree Alignment[C]//Proceedings of the 14^th International World Wide Web Conference.2005:76-85.

二级参考文献5

  • 1Laender H F, Ribeiro-Neto B A, A S da Silva, et al.A Brief Survey of Web Data Extraction Tools.SIGMOD Record, 2002, 31(2): 84-93
  • 2Sahuguet A, Azavan F.Building Intelligent Web Applications Using Lightweight Wrappers.Data and Knowledge Engineering, 2001,36 (3), 283-316
  • 3Crescenzi V, Mecca G, Merialdo P.RoadRunner: Towards Automatic Data Extraction from Large Web Sites.Rome, Italy: In: Proceeding of the 26th International Conference on Very Large Database Systems, 2001:109-118
  • 4Liu L, Pu C, Han W.XWRAP: An XML-enable Wrapper Construction System for Web Information Sources.San Diego, California: In: Proceedings of the 16th IEEE International Conference on Data Engineering, 2000: 611-621
  • 5李晶,陈恩红.Web信息抽取[J].计算机科学,2003,30(6):78-81. 被引量:17

共引文献23

同被引文献56

引证文献8

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部