期刊文献+

多信息块Web页面中的抽取规则 被引量:6

Extraction Rule of MIB Web Page
下载PDF
导出
摘要 以往的包装器主要针对仅含有一个数据块的Web页面,而对含有多个信息块的Web页面,简称MIB(Multiple Information Block), Web页面无法处理。该文提出了一个新的抽取规则,结合了基于文档结构的抽取规则和基于特征Pattern匹配的抽取规则的优点,能够有效地抽取MIB Web页面中的信息。 The existent wrapper can not correctly extract all the information from such page is called MIB (multiple information block) Web page. A kind of new extraction rule, which combines the advantage of extraction rules based on document structure and extraction rules based on patterns, is introduced to solve the problem.
出处 《计算机工程》 CAS CSCD 北大核心 2003年第9期42-44,50,共4页 Computer Engineering
基金 国家自然科学基金项目(60073030) 国家高技术研究发展计划"863"计划项目(2001AA114041)
关键词 Web 信息抽取 包装器 抽取规则 信息集成 Web Information extraction Wrapper Extraction rule Information integration
  • 相关文献

参考文献9

  • 1Hammer J, Garcia-Molina H, Cho J, et al. Extracting Semistructured Information from the Web. Proceedings of file First Workshop on Management of Semistructured Data, 1997-05.
  • 2Sahuguet A, Azavant F. Building Light-weight Wrappers for Legacy Web Data-sources Using W4F. International Conference on Very Large Databases (VLDB), 1999.
  • 3Soderland S. Learning Information Extraction Rules for Semistructured and FreeText. Machine Learning, 1999.
  • 4Kushmerick N, Weld D, Doorenbos B. Wrapper Induction for Information Extraction. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), 1997.
  • 5Muslea I, Minton S, Knoblock C. STALKER: Learning Extraction Rules for Semistructured, Web-based Information Sources. AAAI-98 Workshop on "AI & Information Integration", 1998.
  • 6Muslea I. Extraction Patterns: From Information Exlraction to Wrapper Induction. Technical Report, Information Sciences Institute,University of Southern Califomi, 1998.
  • 7Doorenbos R B, Etzioni O, Weld D W. A Scalable Comparison-shopping Agent for the World Wide Web. In Proceedings of the First International Conference on Autonomous Agents, 1997-02.
  • 8Gao X, Sterling L AutoWrapper: Automatic Wrapper Generation for Multiple Online Services. In Proceedings of Asia Pacific Web Conference 1999 (AP- Web99), 1999.
  • 9Chang C H, Lui S C. IEPAD: Information Extraction Based on Pattern Discovery. In the Proceedings of the Tenth International Conference on World Wide Web, Hongkong, 2001-05.

同被引文献20

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部