The random forest model is universal and easy to understand, which is often used for classification and prediction. However, it uses non-selective integration and the majority rule to judge the final result, thus the ...The random forest model is universal and easy to understand, which is often used for classification and prediction. However, it uses non-selective integration and the majority rule to judge the final result, thus the difference between the decision trees in the model is ignored and the prediction accuracy of the model is reduced. Taking into consideration these defects, an improved random forest model based on confusion matrix (CM-RF)is proposed. The decision tree cluster is selectively constructed by the similarity measure in the process of constructing the model, and the result is output by using the dynamic weighted voting fusion method in the final voting session. Experiments show that the proposed CM-RF can reduce the impact of low-performance decision trees on the output result, thus improving the accuracy and generalization ability of random forest model.展开更多
As society has developed,increasing amounts of data have been generated by various industries.The random forest algorithm,as a classification algorithm,is widely used because of its superior performance.However,the ra...As society has developed,increasing amounts of data have been generated by various industries.The random forest algorithm,as a classification algorithm,is widely used because of its superior performance.However,the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features,thereby affecting its classification accuracy,and resulting in a low data calculation efficiency in the stand-alone mode.In response to the aforementioned problems,related optimization research was conducted with Spark in the present paper.This improved random forest algorithm performs feature extraction according to the calculated feature importance to form a feature subspace.When generating a random forest model,it selects decision trees based on the similarity and classification accuracy of different decision.Experimental results reveal that compared with the original random forest algorithm,the improved algorithm proposed in the present paper exhibited a higher classification accuracy rate and could effectively classify data.展开更多
Urban and community forestry is a specialized discipline focused on the meticulous management of trees and forests within urban,suburban,and town environments.This field often entails extensive civic involvement and c...Urban and community forestry is a specialized discipline focused on the meticulous management of trees and forests within urban,suburban,and town environments.This field often entails extensive civic involvement and collaborative partnerships with institutions.Its overarching objectives span a spectrum from preserving water quality,habitat,and biodiversity to mitigating the Urban Heat Island(UHI)effect.The UHI phenomenon,characterized by notably higher temperatures in urban areas compared to rural counterparts due to heat absorption by urban infrastructure and limited urban forest coverage,serves as a focal point in this study.The study focuses on developing a methodological framework that integrates Geographically Weighted Regression(GWR),Random Forest(RF),and Suitability Analysis to assess the Urban Heat Island(UHI)effect across different urban zones,aiming to identify areas with varying levels of UHI impact.The framework is designed to assist urban planners and designers in understanding the spatial distribution of UHI and identifying areas where urban forestry initiatives can be strategically implemented to mitigate its effect.Conducted in various London areas,the research provides a comprehensive analysis of the intricate relationship between urban and community forestry and UHI.By mapping the spatial variability of UHI,the framework offers a novel approach to enhancing urban environmental design and advancing urban forestry studies.The study’s findings are expected to provide valuable insights for urban planners and policymakers,aiding in creating healthier and more livable urban environments through informed decision-making in urban forestry management.展开更多
为了解决多源挥发性有机物(Volatile Organic Compounds,VOCs)数据存在数据维度高、数据关系复杂、数据存在异常的问题,建立了基于核主成分分析(Kernel Principal Component Analysis,KPCA)、孤立森林(Isolated Forest,IF)、加权随机森...为了解决多源挥发性有机物(Volatile Organic Compounds,VOCs)数据存在数据维度高、数据关系复杂、数据存在异常的问题,建立了基于核主成分分析(Kernel Principal Component Analysis,KPCA)、孤立森林(Isolated Forest,IF)、加权随机森林(Weighted Random Forest,WRF)混合方法的VOCs数据清洗模型。首先对研究区域进行网格划分,建立了基于KPCA-IF的VOCs降维异常数据识别模型,通过KPCA方法对多源混合VOCs数据降维,使用IF算法识别异常数据并进行剔除。然后设计了基于WRF的VOCs数据补偿算法,对降维与异常识别后的数据集进行缺失值回归填补。最后,以西安市为例,选取空气质量数据、气象数据等多源VOCs数据进行数据清洗。结果表明,该混合模型可有效对多源VOCs数据降维,进行数据清洗的平均绝对误差为5.08、均方根误差为10.24、中值绝对误差为3.54,均优于对比模型,证明了KPCA-IF-WRF混合模型的鲁棒性更强、精确度更高,具有科学性和可行性。展开更多
Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate es...Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future management plans. The goal of this study was to compare various imputation methods to predict forest biomass and basal area, at a project planning scale (a combination of ground inventory plots, light detection and ranging (LiDAR) data, satellite imagery, and climate data was analyzed, and their root mean square error (RMSE) and bias were calculated. Results indicate that for biomass prediction, the k-nn (k = 5) had the lowest RMSE and least amount of bias. The second most accurate method consisted of the k-nn (k = 3), followed by the GWR model, and the random forest imputation. For basal area prediction, the GWR model had the lowest RMSE and least amount of bias. The second most accurate method was k-nn (k = 5), followed by k-nn (k = 3), and the random forest method. For both metrics, the GNN method was the least accurate based on the ranking of RMSE and bias.展开更多
针对地质灾害易发性评价因子分级数不确定的问题,引入自适应膨胀因子模糊覆盖分级方法(fuzzy cover approach for clustering based on adaptive inflation factor,AIFFC)对易发性评价因子分级进行优化。以湖南省湘乡市为研究区,提取了...针对地质灾害易发性评价因子分级数不确定的问题,引入自适应膨胀因子模糊覆盖分级方法(fuzzy cover approach for clustering based on adaptive inflation factor,AIFFC)对易发性评价因子分级进行优化。以湖南省湘乡市为研究区,提取了坡度、坡向、高程、年平均降雨量、归一化植被指数、道路、断层、岩性和土地利用9类评价因子,运用AIFFC及自然断点法(natural breakpoint classification,NBC)对连续型因子进行分级,并分别代入加权信息量模型和随机森林模型,获取研究区易发性区划图。采用单因子分级结果精度、灾积比分析和易发性分区结果对AIFFC分级法的优越性进行检验,结果表明:各因子采用AIFFC算法分级的AUC值均高于自然断点法;基于AIFFC的随机森林模型及加权信息量模型的高易发区灾积比分别提升了56.3%、74.6%,低易发区灾积比分别降低了48%、58.1%,AUC值分别提升了7.6%、2.7%。采用AIFFC分级方法优化了地质灾害易发性评价因子分级,显著提高了地质灾害易发性评价的合理性。展开更多
基金Science Research Project of Gansu Provincial Transportation Department(No.2017-012)
文摘The random forest model is universal and easy to understand, which is often used for classification and prediction. However, it uses non-selective integration and the majority rule to judge the final result, thus the difference between the decision trees in the model is ignored and the prediction accuracy of the model is reduced. Taking into consideration these defects, an improved random forest model based on confusion matrix (CM-RF)is proposed. The decision tree cluster is selectively constructed by the similarity measure in the process of constructing the model, and the result is output by using the dynamic weighted voting fusion method in the final voting session. Experiments show that the proposed CM-RF can reduce the impact of low-performance decision trees on the output result, thus improving the accuracy and generalization ability of random forest model.
基金This paper is partially supported by the Social Science Foundation of Hebei Province(No.HB19JL007)the Education technology Foundation of the Ministry of Education(No.2017A01020).
文摘As society has developed,increasing amounts of data have been generated by various industries.The random forest algorithm,as a classification algorithm,is widely used because of its superior performance.However,the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features,thereby affecting its classification accuracy,and resulting in a low data calculation efficiency in the stand-alone mode.In response to the aforementioned problems,related optimization research was conducted with Spark in the present paper.This improved random forest algorithm performs feature extraction according to the calculated feature importance to form a feature subspace.When generating a random forest model,it selects decision trees based on the similarity and classification accuracy of different decision.Experimental results reveal that compared with the original random forest algorithm,the improved algorithm proposed in the present paper exhibited a higher classification accuracy rate and could effectively classify data.
文摘Urban and community forestry is a specialized discipline focused on the meticulous management of trees and forests within urban,suburban,and town environments.This field often entails extensive civic involvement and collaborative partnerships with institutions.Its overarching objectives span a spectrum from preserving water quality,habitat,and biodiversity to mitigating the Urban Heat Island(UHI)effect.The UHI phenomenon,characterized by notably higher temperatures in urban areas compared to rural counterparts due to heat absorption by urban infrastructure and limited urban forest coverage,serves as a focal point in this study.The study focuses on developing a methodological framework that integrates Geographically Weighted Regression(GWR),Random Forest(RF),and Suitability Analysis to assess the Urban Heat Island(UHI)effect across different urban zones,aiming to identify areas with varying levels of UHI impact.The framework is designed to assist urban planners and designers in understanding the spatial distribution of UHI and identifying areas where urban forestry initiatives can be strategically implemented to mitigate its effect.Conducted in various London areas,the research provides a comprehensive analysis of the intricate relationship between urban and community forestry and UHI.By mapping the spatial variability of UHI,the framework offers a novel approach to enhancing urban environmental design and advancing urban forestry studies.The study’s findings are expected to provide valuable insights for urban planners and policymakers,aiding in creating healthier and more livable urban environments through informed decision-making in urban forestry management.
文摘为了解决多源挥发性有机物(Volatile Organic Compounds,VOCs)数据存在数据维度高、数据关系复杂、数据存在异常的问题,建立了基于核主成分分析(Kernel Principal Component Analysis,KPCA)、孤立森林(Isolated Forest,IF)、加权随机森林(Weighted Random Forest,WRF)混合方法的VOCs数据清洗模型。首先对研究区域进行网格划分,建立了基于KPCA-IF的VOCs降维异常数据识别模型,通过KPCA方法对多源混合VOCs数据降维,使用IF算法识别异常数据并进行剔除。然后设计了基于WRF的VOCs数据补偿算法,对降维与异常识别后的数据集进行缺失值回归填补。最后,以西安市为例,选取空气质量数据、气象数据等多源VOCs数据进行数据清洗。结果表明,该混合模型可有效对多源VOCs数据降维,进行数据清洗的平均绝对误差为5.08、均方根误差为10.24、中值绝对误差为3.54,均优于对比模型,证明了KPCA-IF-WRF混合模型的鲁棒性更强、精确度更高,具有科学性和可行性。
文摘Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future management plans. The goal of this study was to compare various imputation methods to predict forest biomass and basal area, at a project planning scale (a combination of ground inventory plots, light detection and ranging (LiDAR) data, satellite imagery, and climate data was analyzed, and their root mean square error (RMSE) and bias were calculated. Results indicate that for biomass prediction, the k-nn (k = 5) had the lowest RMSE and least amount of bias. The second most accurate method consisted of the k-nn (k = 3), followed by the GWR model, and the random forest imputation. For basal area prediction, the GWR model had the lowest RMSE and least amount of bias. The second most accurate method was k-nn (k = 5), followed by k-nn (k = 3), and the random forest method. For both metrics, the GNN method was the least accurate based on the ranking of RMSE and bias.
文摘针对地质灾害易发性评价因子分级数不确定的问题,引入自适应膨胀因子模糊覆盖分级方法(fuzzy cover approach for clustering based on adaptive inflation factor,AIFFC)对易发性评价因子分级进行优化。以湖南省湘乡市为研究区,提取了坡度、坡向、高程、年平均降雨量、归一化植被指数、道路、断层、岩性和土地利用9类评价因子,运用AIFFC及自然断点法(natural breakpoint classification,NBC)对连续型因子进行分级,并分别代入加权信息量模型和随机森林模型,获取研究区易发性区划图。采用单因子分级结果精度、灾积比分析和易发性分区结果对AIFFC分级法的优越性进行检验,结果表明:各因子采用AIFFC算法分级的AUC值均高于自然断点法;基于AIFFC的随机森林模型及加权信息量模型的高易发区灾积比分别提升了56.3%、74.6%,低易发区灾积比分别降低了48%、58.1%,AUC值分别提升了7.6%、2.7%。采用AIFFC分级方法优化了地质灾害易发性评价因子分级,显著提高了地质灾害易发性评价的合理性。