期刊文献+

基于规则归纳的信息抽取系统实现 被引量:19

Implementation of rule induction-based information extraction system
下载PDF
导出
摘要 面对Web信息的迅猛增长,信息抽取技术非常适合于从大量的文档中抽取需要的事实数据。通过文档对象模型(DOM)解析以及检索、抽取、映射等规则的定义,设计并实现了一种具有规则归纳能力的信息抽取系统,用于Web信息的自动检索。在用于抽取规则归纳的框架下,还重点对用于生成抽取模式的WHISK学习算法进行了实验对比分析,结果表明系统对于单槽和多槽数据都具有不错的归纳学习能力。 With the rapid increase of Web information,Information Extraction (IE) techniques are good for automatically extracting data of interest from a mass of Web documents.In this paper,the design and the implementation of a rule induction based IE system is presented for automating Web information retrieval by DOM parsing and rules for retrieval,extraction and mapping. In this framework for rule induction,the authors particularly focus on the experiments with the WHISK algorithm for generating patterns.Experimental results show that the system performs well on both single-slot and multi-slot extraction tasks.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第21期166-170,共5页 Computer Engineering and Applications
基金 国家自然科学基金( the National Natural Science Foundation of China under Grant No.60775028) 大连市科技局重大项目( No.2007A14GX042) 吉林大学符号计算与知识工程教育部重点实验室开放课题( No.93K-17-2006-04)
关键词 信息抽取 抽取规则 DOM 学习算法 information extraction extraction rule DOM leaming algorithm
  • 相关文献

参考文献16

  • 1Laender A H F,Ribeiro-Neto B A,da Silva A S,et al.A brief survey of web data extraction Tools[J],SIGMOD Records,2002,31(2).
  • 2Arocena G O,Mendelzon A O.WebOQL:restructuring documents, databases,and Webs[C]//Proceedings of the 14th IEEE International Conference on Data Engineering Orlando, Florida, 1998 : 24-33.
  • 3Sahuguet A,Azavant F.Building intelligent web application using lightweight wrappers[J].Data and Knowledge Engineering,2001,36 (3):283-316.
  • 4Crescenzi V,Mecca G,Merialdo P.RoadRunner:towards automatic data extraction from large Web sites[C]//Proceedings of the 26th International Conference on Very Large Database Systems,Rome, Italy, 2001 : 109-118.
  • 5Soderlan S.Learnlng information extraction rules for semi-structured and free text[J].Machine Learning, 1999,34(1-3):233-272.
  • 6Califf M E,Mooney R J.Relational learning of patteru-match rules for information extraction[C]//Proceeding of the 16th National Conference on Artificlal Intelligence and 1th Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, 1999 : 328-334.
  • 7Kushmerick N,Weld D S,Doorenbos R.Wrapper induction for information extraction[C]//15th International Joint Conference on Artificial Intelligence(IJCAI-97),Nagoya,August 1997.
  • 8Hsu C-N,Dung M-T.Generating finite-state transducers for semistructured data extraction from the Web[J].Information Systems,1998, 23(8) :521-538.
  • 9李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量:101
  • 10Liu L,Pu C,Han W.XWRAP:an XML-enable wrapper construction system for Web information sources[C]//Proceedings of the 16th IEEE International Conference on Data Engineering,San Diego, California, 2000 : 611-621.

二级参考文献30

  • 1S Lawrence,L Giles,K Bollacker.Digital libraries and autonomous citation indexing[J].IEEE Computer,1999,32(6):67-71.
  • 2E Riloff,R Jones.Learning dictionaries for information extraction by multi-level bootstrapping[A].Proceedings of the Sixteenth National Conference on Artificial Intelligence[C].Orlando:AAAI Press,1999.811-816.
  • 3Kushmerick N.Wrapper induction:Efficiency and expressiveness[J].Artificial Intelligence,2000,118(12):15-68.
  • 4Kristie Seymore,Andrew McCallum,Ronal Rosenfel.Learning hidden Markov model structure for information extraction[A].Proceedings of the AAAI'99 Workshop on Machine Learning for Information Extraction[C].Orlando:AAAI Press,1999.37-42.
  • 5Dayne Frietag,Andrew McCallum.Information extraction with HMMs and shrinkage[A].Proceedings of the AAAI'99 Workshop on Machine Learning for Information Extraction[C].Orlando:AAAI Press,1999.31-36.
  • 6Freitag D,McCallum A.Information extraction with HMM structures learned by stochastic optimization[A].Proceedings of the Eighteenth Conference on Artificial Intelligence[C].Edmonton:AAAI Press,2002.584-589.
  • 7Souyma Ray,Mark Craven.Representing sentence structure in hidden Markov models for information extraction[A].Proceedings of the Seventeenth International Joint Conference On Artificial Intelligence[C].Washington:Morgan Kaufmann,2001.1273-1279.
  • 8T Scheffer,C Decomain,S Wrobel.Active hidden Markov models for information extraction[A].Proceedings of the Fourth International Symposium on Intelligent Data Analysis[C].Lisbon:Springer,2001.301-109.
  • 9Freitag D,McCallum A,Pereira F.Maximum entropy Markov models for information extraction and segmentation[A].Proceedings of The Seventeenth International Conference on Machine Learning[C].San Francisco:Morgan Kaufmann,2000.591-598.
  • 10Lawrence E Rabiner.A tutorial on hidden Markov models and selected application in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286.

共引文献161

同被引文献129

引证文献19

二级引证文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部