期刊文献+

网页数据抽取中Wrapper的维护

Wrapper Maintenance During Web Data Extracting
下载PDF
导出
摘要 当网页结构发生动态变化时,所构建的网页数据抽取器Wrapper往往会失灵.为了解决这一问题,提出了Wrapper维护模型结构.实验证明,当网页数据结构发生变化时,该模型结构能更有效地支持网页数据的抽取. Currently when the web page structure experiences dynamic change,the web data extraction devices often fail to work.In order to solve the problem,Wrapper maintenance model is proposed.Experimental comparison shows that this model could more effectively help web data extraction when data structure is different.
作者 邓莎莎 李嘉
出处 《上海电力学院学报》 CAS 2011年第4期378-382,共5页 Journal of Shanghai University of Electric Power
关键词 Wrapper维护 网页数据抽取 语义块 Wrapper maintenance web data extracting semantic block
  • 相关文献

参考文献6

  • 1孟小峰,陆宏钧,王海燕,谷明哲.Data Extraction from the Web Based on Pre—Defined Schema[J].Journal of Computer Science & Technology,2002,17(4):371-382. 被引量:4
  • 2KUSHMERICK N. Wrapper verification[J]. World Wide Web Journal, 2000, 3(2): 79-94.
  • 3KUSHMERICK N. Regression testing for wrapper maintenance [C]//Proceeding of the AAAI, Heidelberg, Gennany, 1999 : 74-79.
  • 4KNOBLOCK C, LEMAN K, MINTOA S, et al. Accurately and reliably extracting data from the web: a machine learning approach[J]. Data Engineering, 2000, 23(4): 33-41.
  • 5CHIDLOVSKII B. Automatic repairing of web Wrapper[C]// Proceeding of the Third International Workshop on Web Information and Data Management, Atlanta, USA, 2001: 24- 30.
  • 6SAHUGUET A, AZAVANT F. Building light-weight Wrapper for legacy web data-source using W4F[ C ]//Proceeding of the Very Large Data Bases (VLDB), Edinburgh, Scotland, 1999: 738-741.

二级参考文献12

  • 1Hammer J, Brenning M, Garcia-Molina H et al. Template-based wrappers in the TSIMMIS system. In Proc. ACMSIGMOD'97, Tucson, Arizona, May, 1997, pp.532-535.
  • 2Doorenbos R, Etsionoi O, Weld D S. A scalable comparison-shopping agent for the World-Wide-Web. In Proc. the First Int. Conference on Autonomous Agents, ACM Press, New York, February, 1997, pp.39-48.
  • 3Knoblock C A, Minton S, Ambite J Let al. Modeling web sources for information integration. In Proc. AAAI'98,Madison, WI, 1998, pp.211-218.
  • 4Kushmerick N, Weil D, Doorenbos R. Wrapper induction for information extraction. In Proc. Int. Joint Conferenceon Artificial Intelligence (IJCAI'97), Nagoya, Japan, 1997, pp.729-735.
  • 5Sahuguet A, Azavant F. WysiWyg web wrapper Factory (W4F). In Proc. WWW'99, Toronto, Oct., 1999.
  • 6Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In Proc. the VLDB'01, Roma,Italy, Sept., 2001, pp.119-128.
  • 7World Wide Web Consortium (W3C). The Document Object Model, http://www.w3.org/DOM, 1998.
  • 8Raggett D. Clean up your web pages with HTML tidy. http://www.w3.org/People/Raggett/tidy/, 2000.
  • 9Meng X F, Lu H J, Gu M Z et al. A schema-guided wrapper generation for the web. In Proc. ICDE Demo, Feb.,2002.
  • 10Abiteboulm S, Buneman P, Suciu D. Data on the Web -- From Relations to Semi-Structured Data and XML.Morgan Kaufmann Pub., 2000.

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部