期刊文献+

基于文档结构与深度学习的金融公告信息抽取 被引量:10

Information extraction of financial announcement based on document structure and deep learning
下载PDF
导出
摘要 针对金融类公告中的结构化数据难以被高效快速提取的问题,提出一种基于文档结构与Bi-LSTM-CRF网络模型的信息抽取方法。自定义一种文档结构树生成算法,利用规则从文档结构树中抽取所需节点信息;构建基于信息句触发词的局部句子规则,抽取包含结构化字段信息的信息句;将字段的结构化信息抽取看作序列标注问题,分词时加入领域知识词典,构建基于Bi-LSTM-CRF的神经网络模型进行字段信息识别。实验结果表明,该信息抽取方法可以满足多类型公告的结构化信息提取,最终的信息句与字段信息抽取的平均F1值均可达到91%以上,验证了该方法在产品业务中的可行性和实用性。 Structured data in financial bulletins are difficult to extract efficiently and quickly,a method of extracting information based on document structure and Bi-LSTM-CRF network model was proposed.A document structure tree generation algorithm was defined to extract the required node information from the document structure tree by using rules.A local sentence rule based on trigger words of information sentences was constructed to extract information sentences containing structured field information.The structured information extraction of field was regarded as the problem of sequence labeling.A domain knowledge dictionary was added to the word segmentation,and a Bi-LSTM-CRF based neural network model was constructed to recognize field information.Experimental results show that the information extraction method can satisfy the structural information extraction of multi-type announcements.The average F1 value of the final information sentence and field information extraction can reach over 91%,which verifies the feasibility and practicability of the proposed method in product business.
作者 黄胜 王博博 朱菁 HUANG Sheng;WANG Bo-bo;ZHU Jing(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Key Laboratory of Optical Communications and Networking,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Data Center,Shenzhen Securities Information Limited Company,Shenzhen 518000,China)
出处 《计算机工程与设计》 北大核心 2020年第1期115-121,共7页 Computer Engineering and Design
基金 国家自然科学基金项目(61371096)
关键词 公告 信息抽取 神经网络 文档结构树 序列标注 announcement information extraction neural network document structure tree sequence labeling
  • 相关文献

参考文献3

二级参考文献47

  • 1张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量:65
  • 2王娟,慈林林,姚康泽.特征选择方法综述[J].计算机工程与科学,2005,27(12):68-71. 被引量:64
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:151
  • 4周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量:112
  • 5Wikipedia:Message Understanding Conference[EB/OL].2013-12-27.http://en.wikipedia.org/wiki/Message_Understanding_Conference.
  • 6Wikipedia:Named Entity Recognition[EB/OL].2013-12-28.http://en.wikipedia.org/wiki/Named_Entity_Recognition.
  • 7Rizzo G,Troncy R.NERD:Evaluating Named Entity Recognition Toolsinthe Web of Data[J].Lecture Notesin Computer Science,2012(7295):39-55.
  • 8Rizzo G,Troncy R.NERD:A Framework for Unifying Named Entity Recognition and Disam biguation Extraction Tools[C]∥13th Conference ofthe European Chapter of the Association for ComputationalL inguistics.2012:73-76.
  • 9Li Chen-liang,Weng Jian-shu.TwiNER:Named Entity Recognition in Targeted Twitter Stream[C]∥SIGIR.2012:721-730.
  • 10Liu Xiao-hua,Zhang Shao-dian,et al.Recognizing Named Entitiesin Tweets[C]∥ACL.2011:359-367.

共引文献153

同被引文献149

引证文献10

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部