期刊文献+

基于条件随机场的科研论文信息分层抽取 被引量:3

Hierarchical information extraction from research papers based on conditional random fields
下载PDF
导出
摘要 在利用条件随机场进行信息抽取时,单纯基于词或基于块的方法,不能充分利用上下文信息在恰当粒度上进行切分和抽取,因此提出了一种基于条件随机场的科研论文信息分层抽取方法,利用分隔符、换行符、行首字符等格式信息,结合条件随机场的特征函数,将文本切分成文本行、块或单个的词等恰当的层次,再采用L-BFGS算法学习模型参数并进行特定文本域的抽取。实验结果表明,该方法的抽取性能优于基于词或块的条件随机场模型的信息抽取方法。 Current information extractions from research papers based on CRFs just segment text into total blocks or words, so can not fully utilize the context information to segment and extract them in the proper granularity. This paper proposed a hierarchical information extraction from research papers based on CRFs. The algorithm made use of the format information such as list separator, new line character and line header character, and combined them with the feature functions of CRFs to segment the text hierarchically into proper lines, blocks and words. Finally on different hierarchy applied the CRFs to the extraction information in special fields. Experimental results show that the proposed method possesses better performance than that based on the CRFs siniply segments text into total blocks or words.
出处 《计算机应用研究》 CSCD 北大核心 2009年第10期3690-3693,共4页 Application Research of Computers
基金 重庆市科委自然科学基金计划资助项目(2007BB2372) 中国博士后科学基金资助项目(20070420711)
关键词 信息抽取 条件随机场 分层 information extraction conditional random fields(CRFs) hierarchy
  • 相关文献

参考文献10

  • 1李朝光,张铭,邓志鸿,杨冬青,唐世渭.论文元数据信息的自动抽取[J].计算机工程与应用,2002,38(21):189-191. 被引量:38
  • 2郭志鑫.基于本体的文档引文元数据信息抽取[J].微计算机信息,2006,22(06X):304-306. 被引量:18
  • 3SEYMORE K, MCCALLUM A, ROSENFELD R. Learning hidden Markov model structure for information extraction [ C]//Proc of the AAAI Workshop on Machine Learning for Information Extraction. Orlando : AAAI Press, 1999:37-42.
  • 4刘云中,林亚平,陈治平.基于隐马尔可夫模型的文本信息抽取[J].系统仿真学报,2004,16(3):507-510. 被引量:51
  • 5林亚平,刘云中,周顺先,陈治平,蔡立军.基于最大熵的隐马尔可夫模型文本信息抽取[J].电子学报,2005,33(2):236-240. 被引量:48
  • 6HAN H, GILES C, MANAVOGLU E, et al. Automatic document metadata extraction using support vector machines [ C ]//Pmc of Joint Conf on Digital Libraries. Houston : IEEE Press, 2003:37-48.
  • 7LAFFERTY J D, McCALLUM A, PEREIRA F C N. Conditional random fields:probabilistic models for segmenting and labeling sequence data[ C ]//Proc of ICML. San Francisco:Morgan Kaufmann Publishers Inc,2001:282-289.
  • 8PENG F C, MeCALLUM A. Accurate information extraction from research papers using conditional random fields[ C]//Proc of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004). New York : ACM Press, 2004:329-336.
  • 9于江德,樊孝忠,尹继豪.基于条件随机场的中文科研论文信息抽取[J].华南理工大学学报(自然科学版),2007,35(9):90-94. 被引量:11
  • 10Data set for IE [ EB/OL]. ( 1999 ). http://www-2, cs. cmu. edu/ kseymore/ie, html.

二级参考文献39

  • 1赵英环,郭贵锁.基于主题词迭代提取的信息检索算法[J].华南理工大学学报(自然科学版),2004,32(z1):77-80. 被引量:3
  • 22003 CES消费电子展专题报道(二)[J].消费电子,2003,0(4):8-11. 被引量:1
  • 3林亚平,刘云中,周顺先,陈治平,蔡立军.基于最大熵的隐马尔可夫模型文本信息抽取[J].电子学报,2005,33(2):236-240. 被引量:48
  • 4娄雅斌,陶凤梅,马垣.基于“本体”的异构数据源的集成方法研究[J].微计算机信息,2005,21(10X):117-118. 被引量:20
  • 5[1]A. McCallum, K. Nigam, J. Rennie, and K. Seymore. A machine learning approach to building Domain-Specific Search Engines [A]. In Proceedings of IJCAI-99 [C]. 622-667.
  • 6[2]Ellien Riloff. Automatically Constructing a Dictionary for Information Extraction Task [A]. Proceeding for the Eleventh National Conference on Artificial Intelligence [C]. 1993. 811-816.
  • 7[3]E. Riloff , R. Jones. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping [A]. Proceedings of the Sixteenth National Conference on Artificial Intelligence [C]. 1999. 811-816.
  • 8[4]S. Soderland. Learning information extraction rules for semi-structured and free text [J]. Machine Learning, 1999, 1-44.
  • 9[5]Kushmerick, N. Wrapper induction: efficiency and Expressiveness [J]. Artificial Intelligence,2000, Vol. 118, pp. 15--68.
  • 10[6]Leek,T. R. Information Extraction Using Hidden Markov Models [D]. Master's thesis, UC san Diego,1997.

共引文献141

同被引文献30

引证文献3

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部