Automatic extraction and structuration of soil–environment relationship information from soil survey reports 被引量：8

Automatic extraction and structuration of soil–environment relationship information from soil survey reports

下载PDF

导出

摘要 In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing(NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps:(1) construction of a knowledge frame and(2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision(P), recall(R), and F1-measure(F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields(CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles(i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source. In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing(NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps:(1) construction of a knowledge frame and(2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision(P), recall(R), and F1-measure(F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields(CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles(i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.

作者 WANG De-sheng LIU Jun-zhi ZHU A-xing WANG Shu ZENG Can-ying MA Tianwu

机构地区 Key Laboratory of Virtual Geographic Environment State Key Laboratory Cultivation Base of Geographical Environment Evolution(Jiangsu Province) Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application State Key Laboratory of Resources and Environmental Information System Department of Geography

出处《Journal of Integrative Agriculture》 SCIE CAS CSCD 2019年第2期328-339,共12页 农业科学学报（英文版）

基金 supported by the National Natural Science Foundation of China (41431177 and 41601413) the National Basic Research Program of China (2015CB954102) the Natural Science Research Program of Jiangsu Province, China (BK20150975 and 14KJA170001) the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province, China

关键词 soil–environment relationship TEXT natural LANGUAGE processing extraction STRUCTURATION soil–environment relationship text natural language processing extraction structuration

分类号 S159 [农业科学—土壤学]

引文网络
相关文献

同被引文献108

1任立,吴萌,甘臣林,陈银蓉.基于SEM-SD模型的城市近郊区农户土地投入行为决策机制仿真研究[J].资源科学,2020,0(2):286-297. 被引量：20
2康孟珍,王秀娟,华净,王浩宇,王飞跃.平行农业:迈向智慧农业的智能技术[J].智能科学与技术学报,2019,0(2):107-117. 被引量：26
3张海瑜,陈庆龙,张斯静,张子怡,杨帆,李鑫星.基于语义知识图谱的农业知识智能检索方法[J].农业机械学报,2021,52(S01):156-163. 被引量：10
4王方伟,杨少杰,赵冬梅,王长广.基于改进TF-IDF的多态蠕虫特征自动提取算法[J].华中科技大学学报（自然科学版）,2020,48(2):79-84. 被引量：3
5张秀英,孙棋,王珂,蒋玉根,林芬芳,韩凝.基于决策树的土壤Zn含量预测[J].环境科学,2008,29(12):3508-3512. 被引量：9
6李梦瑶.中国污染场地环境管理存在的问题及对策[J].中国农学通报,2010,26(24):338-342. 被引量：22
7骆永明.中国污染场地修复的研究进展、问题与展望[J].环境监测管理与技术,2011,23(3):1-6. 被引量：193
8ZHENG Ye-lu,HE Qi-yun,QIAN Ping,LI Ze.Construction of the Ontology-Based Agricultural Knowledge Management System[J].Journal of Integrative Agriculture,2012,11(5):700-709. 被引量：5
9化柏林.针对中文学术文献的情报方法术语抽取[J].现代图书情报技术,2013(6):68-75. 被引量：30
10余勤飞,侯红,白中科,李发生.中国污染场地国家分类体系框架构建[J].农业工程学报,2013,29(12):228-234. 被引量：23

引证文献8

1王夏晖,黄国鑫,朱文会,季国华.大数据支持场地污染风险管控的总体技术策略[J].环境保护,2020,48(19):64-66. 被引量：3
2黄国鑫,朱守信,王夏晖,田梓,季国华,卢然,崔轩,陈茜.基于自然语言处理和机器学习的疑似土壤污染企业识别[J].环境工程学报,2020,14(11):3234-3242. 被引量：5
3朱文会,王夏晖,杨欣桐,何俊,卢然,张筝.基于决策树的区域地块风险管控模式预测[J].中国环境科学,2021,41(12):5771-5778. 被引量：4
4陆晓松,王国庆,李勖之,杜俊洋,孙丽.场地环境大数据采集和机器学习方法在污染智能识别中的应用研究进展[J].生态与农村环境学报,2022,38(9):1101-1111. 被引量：6
5赵瑞雪,杨晨雪,郑建华,李娇,王剑.农业智能知识服务研究现状及展望[J].智慧农业（中英文）,2022,4(4):105-125. 被引量：10
6史晟恺.基于自然语言处理和机器学习的产业用地性质的识别[J].科技资讯,2024,22(2):50-53.
7孙维维,潘贤章,刘杰,郭观林,李衍,王娟,项钰,王睿.不同自然语言处理方法在土壤环境污染调查报告文本信息抽取中的对比研究[J].环境科学研究,2024,37(3):607-615.
8SHI Zhou,ZHANG Wei-li,TENG Hong-fen.Digital mapping in agriculture and environment[J].Journal of Integrative Agriculture,2019,18(2):249-250.

二级引证文献27

1杨烁.人工智能在电视台安全播出中的应用[J].新闻文化建设,2021(3):100-101.
2秦国阳,李生启,秦勇.树形模型在青少年体质测试数据分析中的应用研究[J].六盘水师范学院学报,2022,34(3):113-120. 被引量：1
3陆晓松,王国庆,李勖之,杜俊洋,孙丽.场地环境大数据采集和机器学习方法在污染智能识别中的应用研究进展[J].生态与农村环境学报,2022,38(9):1101-1111. 被引量：6
4程平,晏露.基于CART决策树算法的企业研发项目绩效评价研究[J].财会月刊,2022(24):30-37. 被引量：4
5卫菊红,常润东.机器学习在生态环境大数据中的应用[J].现代工业经济和信息化,2022,12(11):129-131.
6孙婧,樊建平,徐亦飞,刘真,张用川,仇阿根,李英俊,王世清.“互联网+”政务大数据智能服务平台设计与应用[J].集成技术,2023,12(1):4-16. 被引量：2
7丁杨军,钱钢.基于大数据的医保审计优化路径研究[J].卫生经济研究,2023,40(5):47-50.
8余京飞,卓扬凯,蒋艳,刘黎明,苏子涵,赵丽颖,满晓玮,程薇.9567例乳腺癌患者住院费用影响因素研究[J].中国病案,2023,24(4):59-63.
9项长生,刘海龙,赵驰,苏天涛.基于CART算法的桥梁损伤动力数据分析方法[J].长安大学学报（自然科学版）,2023,43(4):50-59.
10孙同,何梦溪,何理,金涛.基于“大智物云”的土壤与地下水修复智慧平台设计研究[J].环境工程技术学报,2023,13(5):1711-1716.

1Shengpeng Liu,Ying Li,Binbin Fan.Hierarchical RNN for Few-Shot Information Extraction Learning[J].国际计算机前沿大会会议论文集,2018(2):20-20.
2Nadir Ouldhamadouche,Amine Achour,Raul Lucio-Porto,Mohammad Islam,Shahram Solayman,Ali Arman,Azin Ahmadpourian,Hamed Achour,Laurent Le Bri-zoual,Mohamed Abdou Djouadi,Thierry Brousse.Electrodes based on nano-tree-like vanadium nitride and carbon nanotubes for micro-supercapacitors[J].Journal of Materials Science & Technology,2018,34(6):976-982. 被引量：1
3Asim BISWAS,Yakun ZHANG.Sampling Designs for Validating Digital Soil Maps: A Review[J].Pedosphere,2018,28(1):1-15. 被引量：3
4LIU An-qi.Relationship Between Text Type and Translation Strategy: with Reference to the Reader and Translator[J].校园英语,2017(40):205-206.
5潘春.A Study on the Process of Reading Comprehension from Psycholinguistic Perspectives[J].校园英语,2018(25):219-219.
6闫海磊,施水才.一种面向时政新闻的命名实体识别方法[J].北京信息科技大学学报（自然科学版）,2018,33(6):23-26. 被引量：3
7陈颖,侯惠敏,李援南.基于条件随机场的虚假评论识别研究[J].北京电子科技学院学报,2017,25(2):47-50. 被引量：2
8张海潮,王昊,唐慧慧,薛蔚.CRFs字角色标注方法在中文附加关键词抽取中的应用研究[J].情报理论与实践,2019,42(2):169-176. 被引量：5
9本刊编辑部.摘要编写须知[J].节水灌溉,2019,0(4):80-80.
10贏创收购Structured Polymers公司,扩大3D打印材料产品组合[J].现代塑料,2019,0(3):13-13.

Journal of Integrative Agriculture

2019年第2期

浏览历史

内容加载中请稍等...

Automatic extraction and structuration of soil–environment relationship information from soil survey reports 被引量：8

同被引文献108

引证文献8

二级引证文献27

相关作者

相关机构

相关主题

浏览历史