
网络信息提取系统研究 被引量:1

A Survey of Web Information Extraction Systems
摘要 阐述网络信息提取系统的研究现状。从信息提取技术和自动化程度两方面对现有网络信息提取系统进行对比,由此将网络信息提取系统分为非自动化、半自动化和全自动化三类。综合考虑标记方法、提取规则类型和特征、学习算法、用户参与度、适用性以及输出接口等因素,对三类系统的性能优劣进行评估。最后对网络信息提取系统进一步的研究工作进行了展望。 The related work of Web information extraction systems is introduced. Comparison is done in two dimensions: extraction technology and automation degree. Web information extraction systems are divided into three types: non-automated, semi-automated and fully automated. The performance of three types of systems is evaluated according to the comprehensive considerations on some fac- tors, such as tokenization method, the types and features of extraction rules, learning algorithms, user participation, fitness and output interface. Finally. the further research work is nrosnected.
作者 许琦
出处 《图书情报工作》 CSSCI 北大核心 2011年第3期106-110,124,共6页 Library and Information Service
基金 浙江省高校优秀青年教师资助计划项目"面向多终端设备的知识信息服务平台研究及应用" 浙江省教育厅科研项目"专利池协同建设与管理关键技术研究"(项目编号:Y200909672)研究成果之一
关键词 信息提取 分装器 提取技术 自动化程度 information extraction wrapper extraction technology automation degree
  • 相关文献


  • 1Message understanding conference. [ 2009 - 08 - 16 ]. http ://www -nlpir. nist. gov/related_projects/muc/.
  • 2Multilingalal entity task evaluation. [ 2009 - 08 - 16]. http ://portal. acm. org/citation, cfm? id = 1119075.
  • 3Automatic content extraction. [ 2009 - 08 - 16 ]. http ://www. itl. hist. gov/iad/mig/tests/ace/.
  • 4Document understanding conference. [ 2009 -08 -16]. http:// duc. nist. gov/.
  • 5Doorenbos R B,Etzioni O, Weld D S. A scalable comparison-shopping agent for the world wide web//Johnson W L. Proceedings of the 1 st international conference on autonomous agents. New York: ACM, 1997:39 - 48.
  • 6Hsu C H, Dung M T. Generating finite-sate transducers for semistructured data extraction from the web. Journal of Information systems, 1998,23 (8) :521 - 538.
  • 7Muslea I,Minton S, Knoblock C. STALKER: Learning extraction rules for semi-structured web-based information sources//Buchanan B G, Uthurusamy R. Proceedings of the 15th national conference on artificial intelligence. California : AAAI Press, 1998:74 - 81.
  • 8Califf M E, Mooney R J. Relational learning of pattern-match rules for information extraction//Hendler J, Subramanian D. Proceedings of the 16th national conference on artificial intelligence and llth conference on innovative applications of artificial intelligence. California : AAAI Press, 1999:328 - 334.
  • 9Soderland S. Learning information extraction rules for semi-structured and free text. Machine Learning, 1999,34( 1 ) :233 -272.
  • 10Tang Jie,Li Juanzi, Lu Hongjun. iASA: Learning to annotate the semantic web. Journal on Data Semantics,2005 (4) : 110 - 145.











使用帮助 返回顶部