Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning

导出

摘要 Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.

作者 Hang He Chao Ma Shan Ye Wenqiang Tang Yuxuan Zhou Zhen Yu Jiaxin Yi Li Hou Mingcai Hou

机构地区 State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation Key Laboratory of Deep-Time Geography and Environment Reconstruction and Applications of Ministry of Natural Resources School of Information Engineering

出处《Journal of Earth Science》 SCIE CAS CSCD 2024年第3期1035-1043,共9页 地球科学学刊（英文版）

基金 supported by the National Natural Science Foundation of China(Nos.42488201,42172137,42050104,and 42050102) the National Key R&D Program of China(No.2023YFF0804000) Sichuan Provincial Youth Science&Technology Innovative Research Group Fund(No.2022JDTD0004)

关键词 Prompt Learning Named Entity Recognition(NER) low resource geological text text information mining big data geology.

分类号 H31 [语言文字—英语]

引文网络
相关文献

参考文献7

1储德平,万波,李红,方芳,王润.基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别[J].地球科学,2021,46(8):3039-3048. 被引量：20
2Chen Guo,Qiang Xu,Xiujun Dong,Weile Li,Kuanyao Zhao,Huiyan Lu,Yuanzhen Ju.Geohazard Recognition and Inventory Mapping Using Airborne LiDAR Data in Complex Mountainous Areas[J].Journal of Earth Science,2021,32(5):1079-1091. 被引量：14
3何炎祥,罗楚威,胡彬尧.基于CRF和规则相结合的地理命名实体识别方法[J].计算机应用与软件,2015,32(1):179-185. 被引量：67
4马凯,田苗,谭永健,王曙,谢忠,邱芹军.基于四份区域地质调查报告构建的命名实体识别试验数据集研发[J].全球变化数据学报（中英文）,2022,6(1):78-84. 被引量：5
5Qinjun Qiu,Miao Tian,Zhong Xie,Yongjian Tan,Kai Ma,Qingfang Wang,Shengyong Pan,Liufeng Tao.Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach[J].Journal of Earth Science,2023,34(5):1406-1417. 被引量：1
6Donald A.Singer.How Deep Learning Networks could be Designed to Locate Mineral Deposits[J].Journal of Earth Science,2021,32(2):288-292. 被引量：3
7王权于,李振华,涂志鹏,陈冠宇,胡君,陈嘉麒,陈建军,吕国斌.基于BERT-BiGRU-CRF模型的岩土工程实体识别[J].地球科学,2023,48(8):3137-3150. 被引量：5

二级参考文献46

1王娟,慈林林,姚康泽.特征选择方法综述[J].计算机工程与科学,2005,27(12):68-71. 被引量：64
2周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量：112
3Grishman R,Sundheim B.Message Understanding Conference-6:A Brief History[C]//Proceedings of the 16th International Conference on Computational Linguistics.1996:466-471.
4Beth M Sundheim.Named entity task definition,version 2.1[C]//Proceedings of the Sixth Message Understanding Conference,1995:219-332.
5MUC[EB/OL]:http://www-nlpir.nist.gov/related_projects/muc/.
6命名实体识别评测组.2004年命名实体评测大纲[OL].http://www.863data.com.cn.
7沈达阳,孙茂松,黄昌宁.中国地名的自动辨识[J].计算机语言发展与应用,1995(10):68-76.
8Manoranjan Dash,Huan.Selection for Classification[J].Intelligent Data Analysis,1997,1(3):131-156.
9Cho H C,Okazaki N,Miwa M,et al.Named entity recognition with multiple segment representations[J].Information Processing&Management,2013,49(4):954-965.
10Miao Y,Yajuan L,Qun L,et al.Chinese Named Entity Recognition and Disambiguation Based on Wikipedia[M]//Natural Language Processing and Chinese Computing.Springer Berlin Heidelberg,2012:272-283.

共引文献103

1陈业明,戴齐,刘捷.融合字位置特征的铁路事故命名实体识别[J].计算机系统应用,2022,31(12):211-219. 被引量：3
2贺金龙,付立军,姚郑,吕鹏飞,黄徐胜.基于网格LSTM混合算法的地质领域用户意图识别[J].计算机系统应用,2020(10):44-52. 被引量：1
3高原,施元磊,张蕾,曹天奕,冯筠.基于游记文本的游客游览行程重构[J].数据分析与知识发现,2020,4(2):165-172. 被引量：5
4杨雷,韦韩,龚尚文,赵莺菲.基于LSTM的桥梁养护文本数据的命名实体识别方法[J].公路交通科技,2023,40(S02):187-192.
5Zhixiang Ji,Xiaohui Wang,Changyu Cai,Hongjian Sun.Power entity recognition based on bidirectional long short-term memory and conditional random fields[J].Global Energy Interconnection,2020,3(2):186-192. 被引量：8
6易应萍,张志强,王强.基于自然语言处理技术的医学命名实体解析研究[J].中国数字医学,2018,13(12):20-22. 被引量：2
7谷川,宋旭.体育赛事命名实体识别研究[J].河南师范大学学报（自然科学版）,2015,43(4):163-167. 被引量：1
8黄水清,王东波,何琳.基于先秦语料库的古汉语地名自动识别模型构建研究[J].图书情报工作,2015,59(12):135-140. 被引量：42
9何炎祥,刘健博,孙松涛,文卫东.基于层叠条件随机场的微博商品评论情感分类[J].山东大学学报（理学版）,2015,50(11):67-73. 被引量：2
10陈锋,翟羽佳,王芳.基于条件随机场的学术期刊中理论的自动识别方法[J].图书情报工作,2016,60(2):122-128. 被引量：22

1Editorial Committee of China Geology[J].China Geology,2024,7(2):377-380.
2张天宇,孙媛媛,杜文玉,邢铁军,林鸿飞,杨亮.基于语义边界增强的司法命名实体识别[J].清华大学学报（自然科学版）,2024,64(5):749-759.
3LIU Zhiwei,HUANG Bo,XIA Chunming,XIONG Yujie,ZANG Zhensen,ZHANG Yongqiang.Few-Shot Named Entity Recognition with the Integration of Spatial Features[J].Wuhan University Journal of Natural Sciences,2024,29(2):125-133.
4Qinjun Qiu,Miao Tian,Zhong Xie,Yongjian Tan,Kai Ma,Qingfang Wang,Shengyong Pan,Liufeng Tao.Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach[J].Journal of Earth Science,2023,34(5):1406-1417. 被引量：1
5Khalid Amin Khan,Gulraiz Akhter,Zulfiqar Ahmad.Integrated geoscience databanks for interactive analysis and visualization[J].International Journal of Digital Earth,2013,6(S02):41-49.
6王海鹏,杜方,宋丽娟,李婷.融合单词级段信息的中文医疗命名实体识别[J].计算机技术与发展,2024,34(6):110-117.
7杨旭,梁志剑.基于多特征融合嵌入与DCNN的临床命名实体识别模型研究[J].中北大学学报（自然科学版）,2024,45(3):265-273.
8夏怡.重启人生——“积极心理”主题班会设计[J].新班主任,2024(5):35-36.
9王彤,王春山,李久熙,朱华吉,缪祎晟,吴华瑞.基于RoFormer预训练模型的指针网络农业病害命名实体识别[J].智慧农业（中英文）,2024,6(2):85-94.
10Gidion Chongo,Jonathan Soldera.Use of machine learning models for the prognostication of liver transplantation: A systematic review[J].World Journal of Transplantation,2024,14(1):164-188. 被引量：2

Journal of Earth Science

2024年第3期

浏览历史

内容加载中请稍等...

Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning

参考文献7

二级参考文献46

共引文献103

相关作者

相关机构

相关主题

浏览历史