
基于启发式信息的Deep Web结果模式获取方法

Deep Web result pattern extracting based on heuristic information
摘要 获取模式信息是深入研究Deep Web数据的必要步骤,针对Deep Web结果模式结构信息的丢失问题,提出了一种基于启发式信息的Deep Web结果模式获取方法。通过解析Deep Web结果页面数据,利用启发式信息为结果页面数据添加正确的属性名,进而得到对应Deep Web的结果模式,并对其进行规范化处理,解决不同数据源结果模式的结构不一致问题。实验验证该方法可以有效地获取Deep Web的结果模式信息。 Extracting schema information is the necessary step in the Deep Web data research,to address the loss problem of Deep Web result schema information,this paper proposed a novel approach Deep Web result pattern extracting based on heuristic information.Through analyzing Deep Web result page data and adding correct attribute names to result pages data by heuristic information,it obtained the corresponding of Deep Web result pattern.Moreover,it solved the structure conflict by standardized treatment.Experimental results show that the method can effectively extract result pattern.
作者 李明 李秀兰
出处 《计算机应用研究》 CSCD 北大核心 2011年第8期3026-3029,共4页 Application Research of Computers
基金 甘肃省自然科学基金资助项目(0809RJZA018)
关键词 DEEPWEB 结果模式 网页数据特征矩阵 启发式信息 Deep Web result schema feature matrix of Web page data heuristic information
  • 相关文献



  • 1CHANG K C , HE B , LI C , et al . Structured databases on the Web: Observations and implications[ J]. ACM SIGMOD Record, 2004, 33 (3):61 -70.
  • 2HE HAI, MENG W Y, LU Y Y, et al. Towards deeper understanding of the search interfaces of the deep Web[ J]. World Wide Web, 2007, 10(2) : 133 - 155.
  • 3CRESCENZI V, MECCA G, MERIALDO P. Roadrunner: Towards automatic data extraction from large Web sites[ EB/OL]. [ 2008 - 05 -05]. http://www, dia. uniroma3, it/- vldbproc/015_109, pdf.
  • 4WANG J, LOCHOVSKY F H. Data extraction and label assignment for Web databases[ C]//Proceedings of the 12th international conference on World Wide Web. New York: ACM Press, 2003:187 - 196.
  • 5ZHAO H, MENG W Y, WU Z, et al. Fully automatic wrapper generation for search engines [ EB/OL]. [ 2008 - 05 - 05 ]. http:// www. www2005, org/edrom/docs/p66, pdf.
  • 6ARLOTTA L, CRESCENZI V, MECCA G, et al. Automatic annotation of data extracted from large Web sites[ EB/OL]. [2008 -05 - 05]. http://www, cse. ogi. edu/webdb03/papers/02, pdf.
  • 7LU Y Y, HE H, ZHAO H K, et al. Annotating structured data of the deep Web [ C]//ICDE 2007: IEEE 23rd International Conference on Data Engineering. [ S.l. ] : IEEE Press, 2007: 376?385.
  • 8WU W, DOAN A, YU C T. WeblQ: Learning from the Web to match Deep-Web query interfaces[ EB/OL]. [2008 -05 -05]. http://www, dit. unitn, it/- p2p/RelatedWork/Matching/icde06- webiq, pdf.
  • 9ARASU A, GARCIA-MOLINA H. Extracting structured data from Web pages[ C]. ICDE '03: 19th International Conference on Data Engineering. [ S. l ] : IEEE Press, 2003:337 -348.
  • 10CROFT B W. Combining approaches to information retrieval[ EB/ OL]. [ 2008 - 05 - 05 ]. http://maroo, cs. umass, edu/pub/web/ getpdf, php?id = 188.









使用帮助 返回顶部