期刊文献+

一种自动抽取Web信息方法的设计与实现 被引量:3

Design and Implementation of an Automatic Web Extraction Method
下载PDF
导出
摘要 针对目前Web信息抽取技术实现复杂、维护困难以及抽取速度慢的问题,本文根据Web页面的特点,提出一种新的Web抽取策略。此策略在处理Web页面时降低了处理Web页面的结构的复杂性,提高了Web信息抽取的速度。并根据策略建立了该Web信息自动抽取方法的模型,此模型首先分析页面的结构,根据结构快速生成抽取规则,构建规则库;并对页面抽取的内容进行分析,构建资源库。基于此模型的方法能自主学习,实现自动抽取,这在很大程度上减少了人工参与,并能获得比较好的抽取结果。 Aiming at the complex implementation, the maintenance of difficult and slow extraction of the Web information extraction technology at present, according to the features of Web pages, a new Web extraction strategy is proposed. When you deal with the Web pages, the strategy can reduce the complexity of the structure, and then the speed of Web information extraction in- creases. Based on the strategy a Web information extraction model is given. Using this model, the needed information can be extracted, at the same time the structure of Web pages is summarized and the rules are generated quickly, then the Rules Library is set up. And the page contents which have been extracted are analyzed, and then resources library is constructed. The model which based on the method has the ability to learn by itself and extracts the information automatically, and also it can reduce the artificial participation in a large degree, so the extracted result is relatively good.
出处 《计算机与现代化》 2009年第1期38-40,48,共4页 Computer and Modernization
关键词 WEB信息抽取 Web抽取策略 自主学习 抽取规则 Web information extraction Web extraction strategy autonomous learning extraction rule
  • 相关文献

参考文献8

  • 1Cohen W, Hurst M, Jensen L. A flexible learning system for wrapping tables and lists in HTML documents [ C ]//Proceedings of the Eleventh International World Wide Web Conference. 2002:232-241.
  • 2贡正仙,朱巧明,李培峰.基于相似页面的Web信息抽取系统的实现[J].计算机应用,2006,26(8):1983-1986. 被引量:3
  • 3Blei D, Bagnell J, McCal-lumA. Learning with scope, with application to information extraction and classification[ C ]// Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intellig-ence. 2002:53-60.
  • 4Wong T L,Lam W. A probabilistic approach for adapting wrapper and discovering new attributes [ C ]// Proceedings of the Fourth IEEE International Conference on Data Mining. 2004:257-264.
  • 5Crescenzi V, Mecca G, Merialdo P. ROADRUNNER: Towards automatic data extraction from large Web sites [ C ]// Proceedings of the 27th Very Large Databases Conference. 2001:317-328.
  • 6王亮,朱征宇.基于扩展标记图的Web信息抽取器[J].计算机工程,2005,31(8):159-161. 被引量:2
  • 7Laender H F, Ribeim-Neto B A, da Silva A S, et al. A brief survey of Web data extraction tools[ J]. SIGMOD Record,2002,31 (2) : 84-93.
  • 8陆剑江,钱培德.基于语料的Web页面抽取器的研究与实现[J].计算机工程,2003,29(6):34-35. 被引量:4

二级参考文献16

共引文献6

同被引文献14

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部