期刊文献+

基于机器学习的历史气候重建论文智能识别与数据挖掘初探 被引量:3

PRELIMINARY STUDY ON MACHINE LEARNING-BASED INTELLIGENT RECOGNITION OF HISTORICAL CLIMATE RECONSTRUCTION PAPERS AND DATA MINING
原文传递
导出
摘要 本文基于机器学习方法开展了从海量的气候变化研究论文中智能识别历史气候重建论文,并提取关键信息的技术研究。首先以人工标注的1450篇古气候重建论文摘要作为样本数据,对机器学习中常见的9种分类模型进行训练和精度检验,发现极端随机树模型在此类文本中具有较高的分类精度;其次,利用这一模型对Research Gate中70万余篇气候变化相关的论文摘要进行智能分类,从中筛选出6039篇千年尺度气候重建论文摘要,并根据词云图验证了分类结果的可靠性。在此基础上,采用命名实体识别技术对6039篇论文摘要,从重建气候要素、代用资料类型和目标地区(国家)这3个维度开展了文本数据挖掘。挖掘结果表明:温度和降水是两大主要的重建要素,树轮、历史文献、沉积(含孢粉)是位居前三位的主要代用资料,这与领域专家经验基本一致;同时,重建气候要素与代用资料类型及二者的组合规律呈现鲜明的地理差异,这与区域气候特征密切相关。 It is a hot topic to carry out integrated reconstructions of historical climate changes using numerous existing single proxy-based reconstructions.To achieve the integrated reconstruction,there is a great demand to collect target papers of existing reconstructions.Taking this background,this study explored a machine learning-based technology of intelligently recognition of historical climate reconstruction papers and carried out key information mining from these papers.Firstly,we prepared a set of 1450 abstracts of published paleoclimate reconstruction papers and tagged one by one artificially with millennium-scale reconstruction and with other reconstruction.We used this set of abstracts as sample dataset to train and test nine machine learning-based classification models.We found that classification accuracy of Extra Trees model was higher than the other models.Then,we used the Extra Trees model on a set of more than 70×10^(4) abstracts of climate change research papers from the ResearchGate website.As a result,6039 abstracts for the millennium-scale climate reconstruction were selected intelligently.The reliability of the 6039 abstracts were also confirmed by comparing its word cloud to that of sample dataset.Finally,using the technology of Named-entity recognition on the 6039 abstracts,three dimensions of information,including reconstructed climate elements,proxy data categories and target regions(countries),were mined intelligently.The frequencies of key words show that on the dimension of climate elements temperature and precipitation are the two most frequently climate elements for reconstruction.On the dimension of proxy data,tree ring,historical documents and sediments(including pollen)are the three most frequently proxy data.These results keep consistent with the experts’experience of this field.The results also show that frequencies of reconstructed climate elements,proxy data categories and their combination exhibit distinct geographical differences,which may be relevant to regional climatic characteristic.
作者 华萌萌 尹君 胡召玲 张学珍 HUA Mengmeng;YIN Jun;HU Zhaoling;ZHANG Xuezhen(Key Laboratory of Land Surface Pattern and Simulation,Institute of Geographical Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101;Jiangsu Normal University,Xuzhou 221116,Jiangsu;University of Chinese Academy of Sciences,Beijing 100049)
出处 《第四纪研究》 CAS CSCD 北大核心 2021年第2期550-561,共12页 Quaternary Sciences
基金 国家重点研发计划项目(批准号:2017YFA0603301) 中国科学院(A类)战略性先导科技专项项目(批准号:XDA19040101)共同资助。
关键词 历史气候 气候重建 文本分类 数据挖掘 机器学习 historical climate climate reconstruction text classification data mining machine learning
  • 相关文献

参考文献29

二级参考文献487

共引文献435

同被引文献40

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部