期刊文献+

顾及中文汉字多特征的矿产资源实体识别 被引量:1

Mineral resource entity recognition considering multiple features of Chinese characters
下载PDF
导出
摘要 矿产资源地质报告中蕴含大量专家经验及基础地质知识。快速准确地从海量矿产资源文本中抽取形成结构化知识已成为目前研究热点,命名实体识别是信息抽取与知识挖掘的重要步骤。针对矿产资源地质文本中存在实体长度长、专业术语多、实体嵌套等问题,已有基于深度学习的命名实体识别直接应用在矿产资源领域性能低下,本文提出了一种矿产资源命名实体识别深度学习模型:ALBERT(A Lite Bidirectional Encoder Representations from Transformers)-BiLSTM(Bi-directional Long Short-Term Memory)-CRF(Conditional Random Field),通过ALBERT预训练语言模型获取地质文本丰富语义特征,同时结合汉字拼音、字形和词边界特征来共同作为嵌入层,从而提高对复杂实体的识别能力。本文方法在人民日报、电子简历数据集及构建的矿产资源数据集上进行实验,结果表明提出方法在准确率、召回率、F1值上分别达到70.97%、64.33%、67.49%。 Mineral resource geological reports contain a large amount of expert empirical knowledge and basic geological knowledge.Rapid and accurate extraction of structured knowledge from massive mineral resource texts has become a hot research topic,and named entity recognition is an important step in information extraction and knowledge mining.To address the problems of long entity length,many technical terms and nested entities in geological texts,the existing deep learning-based named entity recognition is directly applied to the mineral resources field,which leads to low performance,a deep learning model for named entity recognition of mineral resources is proposed:ALBERT-BiLSTM-CRF,through which ALBERT pre-trained language model to obtain rich semantic features of geological text,while combining Chinese pinyin,character form and word boundary features to jointly serve as an embedding layer,thus improving the recognition ability of complex entities.The method in this paper was experimented on the People􀆳s Daily,Resume dataset and the constructed mineral resources dataset,and the results showed that the proposed method achieved 70.97%,64.33%and 67.49%in accuracy,recall and F1 value respectively.
作者 刘志豪 金相国 邱芹军 陶留锋 黄振 谢忠 Liu Zhihao;Jin Xiangguo;Qiu Qinjun;Tao Liufeng;Huang Zhen;Xie Zhong(National Engineering Research Center of Geographic Information System,Wuhan 430074;School of Computer Science,China University of Geosciences(Wuhan),Wuhan 430074;Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources,Shenzhen,Guangdong 518034;National and Local Joint Engineering Laboratory of Geographic Information System,Wuhan 430074)
出处 《地质科学》 CAS CSCD 北大核心 2023年第4期1535-1553,共19页 Chinese Journal of Geology(Scientia Geologica Sinica)
基金 国家重点研发计划项目(编号:2022YFF0711601) 湖北省自然科学基金项目(编号:2022CFB640) 中国博士后科学基金项目(编号:2021M702991) 地质探测与评估教育部重点实验室主任基金项目(编号:GLAB2023ZR01) 自然资源部城市国土资源监测与仿真重点实验室开放基金课题项目(编号:KF-2022-07-014)资助。
关键词 矿产资源报告 命名实体识别 预训练模型 多特征融合 Mineral resources report Named entity recognition Pre-training model Multi-feature fusion
  • 相关文献

参考文献12

二级参考文献107

共引文献86

同被引文献27

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部