期刊文献+

基于遗传算法的Web信息抽取 被引量:2

Web Information Extraction Based on Genetic Algorithm
原文传递
导出
摘要 WHISK系统是一个半自动的IE系统,对结构化、半结构化的Web文本它都能使用生成的抽取规则进行信息抽取.但是它在规则学习过程中规则不能保证以最优的方式进行扩展,且生成规则集的时间较长.文中主要针对这些问题,提出利用遗传算法改进WHISK的监督式学习算法,并采用移除法生成规则集.实验结果表明此方法在效率和召回率上都得到提高. WHISK system is a semi-automatic information extraction (IE) system. It works well in extracting information for structured or semi-structured web texts. However, but there is no guarantee that the rule learning algorithm can extend rules in an optimal way. Besides, the generation of rule set is time-consuming. To solve these problems, the genetic algorithm is introduced to improve the supervised machine learning algorithm WHISK by a heuristic rule expansion, and a removing method is used to generate the rule set. The experimental results show that the proposed algorithm performs well in terms of the efficiency and the recall rate.
作者 郭银蕊 陈荣
出处 《模式识别与人工智能》 EI CSCD 北大核心 2011年第3期385-390,共6页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.60775028) 吉林省信息产业发展专项资金项目(吉信发[2008]40号) 大连市科技局重大项目(No.2007A14GX042)资助
关键词 信息抽取 WHISK系统 遗传算法 规则学习 Information Extraction, WHISK System, Genetic Algorithm, Rule Learning
  • 相关文献

参考文献2

二级参考文献28

  • 1林亚平,刘云中,周顺先,陈治平,蔡立军.基于最大熵的隐马尔可夫模型文本信息抽取[J].电子学报,2005,33(2):236-240. 被引量:48
  • 2Laender A H F,Ribeiro-Neto B A,da Silva A S,et al.A brief survey of web data extraction Tools[J],SIGMOD Records,2002,31(2).
  • 3Arocena G O,Mendelzon A O.WebOQL:restructuring documents, databases,and Webs[C]//Proceedings of the 14th IEEE International Conference on Data Engineering Orlando, Florida, 1998 : 24-33.
  • 4Sahuguet A,Azavant F.Building intelligent web application using lightweight wrappers[J].Data and Knowledge Engineering,2001,36 (3):283-316.
  • 5Crescenzi V,Mecca G,Merialdo P.RoadRunner:towards automatic data extraction from large Web sites[C]//Proceedings of the 26th International Conference on Very Large Database Systems,Rome, Italy, 2001 : 109-118.
  • 6Soderlan S.Learnlng information extraction rules for semi-structured and free text[J].Machine Learning, 1999,34(1-3):233-272.
  • 7Califf M E,Mooney R J.Relational learning of patteru-match rules for information extraction[C]//Proceeding of the 16th National Conference on Artificlal Intelligence and 1th Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, 1999 : 328-334.
  • 8Kushmerick N,Weld D S,Doorenbos R.Wrapper induction for information extraction[C]//15th International Joint Conference on Artificial Intelligence(IJCAI-97),Nagoya,August 1997.
  • 9Hsu C-N,Dung M-T.Generating finite-state transducers for semistructured data extraction from the Web[J].Information Systems,1998, 23(8) :521-538.
  • 10Liu L,Pu C,Han W.XWRAP:an XML-enable wrapper construction system for Web information sources[C]//Proceedings of the 16th IEEE International Conference on Data Engineering,San Diego, California, 2000 : 611-621.

共引文献65

同被引文献17

  • 1Muresan S, Klavans J. A method for automatically building and evaluating dictionary resources : proceedings of the Language Resources and Evaluation Conference, Las Palmas, May 29 -31,2002[ C]. [S. 1. ]: [s. n. ] ,2002.
  • 2Storrer A, Wellinghoff S. Automated detection and annotation of term definitions in German text corpora: proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, May 22 -28, 2006 [ C ]. [ S. 1. ] : [ s. n. ] ,2006.
  • 3Monachesi P, Westerhout E. What can NLP techniques do for eLeaming? : proceedings of the International Conference on In- formatics and Systems 2008, Cairo, March 27 - 29, 2008 [ C ]. Cairo : Cairo University press,2008.
  • 4Penagos C R. Metalinguistic information extraction from specialized texts to enrich computational lexicons [ D ]. Barcelona: Universitat Pompeu Fabra,2004.
  • 5Fahmi I, Bouma G. Learning to identify definitions using syntactic features : proceedings of the EACL 2006 workshop on learn- ing structured information in natural language applications, Trento, April 3 -7,2006 [ C ]. [ S. 1. ] :[ s. n. ] ,2006.
  • 6Pollak S, Vavpetic A, Kranjc J, et al. NLP workflow for on-line definition extraction from English and Slovene text corpora: proceeding of the KONVENS 2012, Vienna, September 19 -21, 2012[ C ] . Vienna: Eigenverlag GAI,2012.
  • 7Wanichayapong N, Pruthipunyaskul W, Pattara-Atikom W, et al. Social-based traffic information extraction and classifica- tion: proceedings of the International Conference on ITS Telecommunications, Saint-Petersburg, July 31 - August 5,2011 [ C ]. Piscataway : IEEE press ,2011.
  • 8Trigui O. How to extract Arabic definitions from the Web? Arabic definition question answering system [ M ]. Berlin: Spring- er,2011:318 -323.
  • 9Gortes C, Vapnik V. Support-vector networks [ J]. Machine learning, 1995,20 (3) :273 - 297.
  • 10Breiman L. Random forests [ J]. Machine Learning,2001,45 ( 1 ) :5 - 32.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部