东盟十国新闻文本的命名实体识别被引量：8

Named Entity Recognition of News Texts in Ten ASEAN Countries

下载PDF

导出

摘要为构建东盟十国知识图谱,需要对相关文本进行命名实体识别工作。设计一种基于双向GRU-CRF的神经网络模型,对中国驻东盟十国大使馆中文新闻数据进行命名实体识别。以预训练的领域词向量为输入,利用双向GRU网络从向量化的文本中提取语义特征;再通过CRF层预测并输出最优标签序列。为了进一步改善结果,在双向GRU和CRF层之间添加两层隐藏层。在数据预处理方面,提出一种数据集划分算法,对文本进行更加科学合理的划分。在东盟十国数据集上,将该模型与几种混合模型进行对比,结果显示所提模型在人名、地名、组织机构名识别任务中拥有更好的识别性能。 In order to construct the knowledge graph of the ten ASEAN member states,it is necessary to perform named entity recognition on related texts.A neural network model based on bi-directional GRU-CRF-based was designed to identify the Chinese news data of the Chinese embassy in the ten ASEAN member states.Taking the pre-trained domain word vector as input,the Bi-directional GRU network was used to extract the semantic features from the vectorized text,and then the CRF layer was used to predict and output the optimal tag sequence.To further improve the results,two layers of hidden layers were added between the Bi-directional GRU and CRF layers.In the aspect of data preprocessing,a data set partition algorithm was proposed to make the text more scientific and reasonable.Compared with several hybrid models in the ASEAN data set,the models shows that it has better recognition performance in the identification of names of person,location and organizations.

作者郑彦斌夏志超郭智黄永忠刘文芬 ZHENG Yan-bin;XIA Zhi-chao;GUO Zhi;HUANG Yong-zhong;LIU Wen-fen(Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology 1, Guilin 541004, China;School of Computer Science and Network Security, Dongguan University of Technology 2 , Dongguan 523808, China)

机构地区桂林电子科技大学广西密码学与信息安全重点实验室东莞理工学院计算机与网络安全学院

出处《科学技术与工程》北大核心 2018年第35期162-168,共7页 Science Technology and Engineering

基金国家自然科学基金(61602125 61866008 61862011 61862012) 广西自然科学基金(2016GXNSFBA380153 2017GXNSFAA198192 2018GXNSFAA138116) 广西密码学与信息安全重点实验室项目(GCIS201625 GCIS201704) 桂林电子科技大学研究生教育创新计划项目(2018YJCX51)资助

关键词双向GRU-CRF 命名实体识别东盟十国知识图谱 BiGRU-CRF named entity recognition ten asean member states knowledge graph

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献4

1徐增林,盛泳潘,贺丽荣,王雅芳.知识图谱技术综述[J].电子科技大学学报,2016,45(4):589-606. 被引量：516
2孙镇,王惠临.命名实体识别研究进展综述[J].现代图书情报技术,2010(6):42-47. 被引量：100
3孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量：87
4俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量：160

二级参考文献177

1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量：198
2孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量：87
3刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):7-13. 被引量：47
4俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量：160
5Chinchor N. MUC - 7 Named Entity Task Definition[C]. In :Proceedings of the 7th Message Understanding Conference, Virginia. 1998.
6Sproat R, Emerson T. The First International Chinese Word Segmentation Bakeoff[ C ]. In : Proceedings of the 2rid SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. 2003 : 133 - 143.
7Rau L F. Extracting Company Names from Text [ C ]. In : Proceedings of the 7th IEEE Conference on Artificial Intelligence Applicatiorts. 1991:29 -32.
8Grishman R, Sundheim B. Message Understanding Conference- 6 : A Brief History [ C ]. In : Proceedings of the 16th International Conference on Computational Linguistics. 1996.
9Chinchor N A. Overview of MUC - 7/MET - 2 [C]. In : Proceedings of the 7th Message Understanding Conference. 1998.
10Zhang Y, Zhou J F. A Trainable Method for Extracting Chinese Entity Names and Their Relations [ C ]. In : Proceedings of the 2nd Chinese Language Processing Workshop, HongKong. 2000:66 - 76.