摘要
现有的信息抽取工作多是针对无层次结构的数据信息,而在实际任务中,文本中的数据常常具有复杂的嵌套层次结构,如文档中包含多个不同类型的信息块序列,每个块中又包含了一个独立的信息序列。针对具有层级结构的信息抽取问题,提出一种基于联合序列标注的层级信息抽取方法。一方面使用BiLSTM-CNN-CRF模型分别对不同层级的数据进行建模,另一方面通过联合学习方法实现层次级的信息抽取,使得不同层次的信息抽取任务能够同时而有效地进行信息交互和独立抽取,提高了信息抽取任务的准确率。
Most of the existing work of information extraction is mostly for data without hierarchical structure.In the actual task,the data in the text often has a complex nested hierarchy,such as a document containing multiple sequences of different types of information blocks,each block also containing an independent sequence of information.In this paper,based on the information extraction problem with hierarchical structure,a hierarchical information extraction method based on joint sequence annotation is proposed.It used the BiLSTM-CNN-CRF model to model the data of different levels;the joint learning method was used to realize the information extraction of the hierarchical data,and the information extraction tasks of different levels could simultaneously and effectively perform information interaction and independent extraction,which improved the accuracy of the information extraction task.
作者
王扬
郑阳
杨青
王旭强
田雨婷
Wang Yang;Zheng Yang;Yang Qing;Wang Xuqiang;Tian Yuting(Information Communication Company,State Grid Tianjin Electric Power Company,Tianjin 300310,China)
出处
《计算机应用与软件》
北大核心
2021年第8期167-174,共8页
Computer Applications and Software
基金
国家自然科学基金项目(U1633103)
天津市科技计划项目(18ZXZNGX00310)
天津市电力公司科技项目(kj18-1-17)。
关键词
信息抽取
命名实体识别
神经网络
联合学习
Information extraction
Named entity recognition
Neural network
Joint learning