期刊文献+

基于条件随机场的中文科研论文信息抽取 被引量:11

Information Extraction from Chinese Research Papers Based on Conditional Random Fields
下载PDF
导出
摘要 科研论文头部信息和引文信息对基于域的论文检索、统计和引用分析是必不可少的.由于隐马尔可夫模型不能充分利用对抽取有用的上下文特征,因此文中提出了一种基于条件随机场的中文科研论文头部和引文信息抽取方法,该方法的关键在于模型参数估计和特征选择.实验中采用L-BFGS算法学习模型参数,并选择局部、版面、词典和状态转移4类特征作为模型特征集.在信息抽取时先利用分隔符、特定标识符等格式信息对文本进行分块,在分块基础上用条件随机场进行指定域的抽取.实验表明,该方法抽取性能明显优于基于隐马尔可夫模型的方法,且加入不同的特征集对抽取性能提升作用不同.  The information of headers and citations of research papers is necessary for many applications,such as the field-based paper search,the paper statistics and the citation analysis.In order to enhance the utilization of context features for information extraction which is greatly restricted by the hidden Markov model(HMM),a method based on the conditional random fields(CRFs) is proposed to extract the information of paper header and citation from Chinese research papers.The proposed method,whose key is the parameter estimation and the feature selection,employs L-BFGS algorithm for the estimation of model parameters in the experiment and selects the categories features of location,layout,lexicon and state transition as the feature set of the model.During the information extraction,the format information about list separators and special-labels is used to segment the text,and then CRFs are applied to the extraction in special fields.Experimental results show that the proposed method possesses better performance than that based on the HMM,and that the performance improvement varies with the features sets.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第9期90-94,106,共6页 Journal of South China University of Technology(Natural Science Edition)
基金 教育部博士点基金资助项目(20050007023)
关键词 信息抽取 条件随机场 引文信息 论文头部信息 information extraction conditional random field citation information paper header information
  • 相关文献

参考文献6

二级参考文献56

  • 1[16]Hobbs J,Appelt D,Bear J et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Roche,Schabes eds. Finite State Devices for Natural Language Processing, MIT Press,Cambridge MA, 1996
  • 2[17]Appelt D E.Introduction to Information Extraction[J].AI COMMUNICATIONS, 1999; 12(3)
  • 3[18]Yangarber R.Scenario Customization for Information Extraction[D].Ph D Thesis.New York University,2001-01
  • 4[19]Cowie J, Lehnert W.Information Extraction[J].Communications of the ACM, 1996;39(1)
  • 5[20]Grishman R Adaptive information extraction and sublangu age analysis[C].In:Proceedings of IJCAI-2001 Workshop on Adaptive Text Extraction and Mining,2001
  • 6[1]Applet D E,Israel D J.Introduction to Information Extraction Technology. A Tutorial for IJCAI-99,1999
  • 7[2]Gaizauskas R,Wilks Y.Information Extraction:Beyond Document Retrieval[J].Journal of Documentation, 1997
  • 8[3]Sager N.Natural Language Information Processing. Reading,Massachusetts:Addison Wesley, 1981
  • 9[4]Dejong G.An Overview of the FRUMP System[C].In:LEHNERT W,RINGLE M h eds. Strategies for Natural Language Processing,Lawrence Erlbaum, 1982:149~176
  • 10[5]Grishman R,Sundheim B.Message Understanding Conference-6:A Brief History[C].In :Proceedings of the 16h International Conference on Computational Linguistics(COLING-96),1996-08

共引文献261

同被引文献120

引证文献11

二级引证文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部