期刊文献+

东盟十国新闻文本的命名实体识别 被引量:8

Named Entity Recognition of News Texts in Ten ASEAN Countries
下载PDF
导出
摘要 为构建东盟十国知识图谱,需要对相关文本进行命名实体识别工作。设计一种基于双向GRU-CRF的神经网络模型,对中国驻东盟十国大使馆中文新闻数据进行命名实体识别。以预训练的领域词向量为输入,利用双向GRU网络从向量化的文本中提取语义特征;再通过CRF层预测并输出最优标签序列。为了进一步改善结果,在双向GRU和CRF层之间添加两层隐藏层。在数据预处理方面,提出一种数据集划分算法,对文本进行更加科学合理的划分。在东盟十国数据集上,将该模型与几种混合模型进行对比,结果显示所提模型在人名、地名、组织机构名识别任务中拥有更好的识别性能。 In order to construct the knowledge graph of the ten ASEAN member states,it is necessary to perform named entity recognition on related texts.A neural network model based on bi-directional GRU-CRF-based was designed to identify the Chinese news data of the Chinese embassy in the ten ASEAN member states.Taking the pre-trained domain word vector as input,the Bi-directional GRU network was used to extract the semantic features from the vectorized text,and then the CRF layer was used to predict and output the optimal tag sequence.To further improve the results,two layers of hidden layers were added between the Bi-directional GRU and CRF layers.In the aspect of data preprocessing,a data set partition algorithm was proposed to make the text more scientific and reasonable.Compared with several hybrid models in the ASEAN data set,the models shows that it has better recognition performance in the identification of names of person,location and organizations.
作者 郑彦斌 夏志超 郭智 黄永忠 刘文芬 ZHENG Yan-bin;XIA Zhi-chao;GUO Zhi;HUANG Yong-zhong;LIU Wen-fen(Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology 1, Guilin 541004, China;School of Computer Science and Network Security, Dongguan University of Technology 2 , Dongguan 523808, China)
出处 《科学技术与工程》 北大核心 2018年第35期162-168,共7页 Science Technology and Engineering
基金 国家自然科学基金(61602125 61866008 61862011 61862012) 广西自然科学基金(2016GXNSFBA380153 2017GXNSFAA198192 2018GXNSFAA138116) 广西密码学与信息安全重点实验室项目(GCIS201625 GCIS201704) 桂林电子科技大学研究生教育创新计划项目(2018YJCX51)资助
关键词 双向GRU-CRF 命名实体识别 东盟十国 知识图谱 BiGRU-CRF named entity recognition ten asean member states knowledge graph
  • 引文网络
  • 相关文献

参考文献4

二级参考文献177

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 3刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):7-13. 被引量:47
  • 4俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:160
  • 5Chinchor N. MUC - 7 Named Entity Task Definition[C]. In :Proceedings of the 7th Message Understanding Conference, Virginia. 1998.
  • 6Sproat R, Emerson T. The First International Chinese Word Segmentation Bakeoff[ C ]. In : Proceedings of the 2rid SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. 2003 : 133 - 143.
  • 7Rau L F. Extracting Company Names from Text [ C ]. In : Proceedings of the 7th IEEE Conference on Artificial Intelligence Applicatiorts. 1991:29 -32.
  • 8Grishman R, Sundheim B. Message Understanding Conference- 6 : A Brief History [ C ]. In : Proceedings of the 16th International Conference on Computational Linguistics. 1996.
  • 9Chinchor N A. Overview of MUC - 7/MET - 2 [C]. In : Proceedings of the 7th Message Understanding Conference. 1998.
  • 10Zhang Y, Zhou J F. A Trainable Method for Extracting Chinese Entity Names and Their Relations [ C ]. In : Proceedings of the 2nd Chinese Language Processing Workshop, HongKong. 2000:66 - 76.

共引文献834

同被引文献82

引证文献8

二级引证文献55

;
使用帮助 返回顶部