Machine learning methods are increasingly used for spatially predicting a categorical target variable when spatially exhaustive predictor variables are available within the study region.Even though these methods exhib...Machine learning methods are increasingly used for spatially predicting a categorical target variable when spatially exhaustive predictor variables are available within the study region.Even though these methods exhibit competitive spatial prediction performance,they do not exactly honor the categorical target variable's observed values at sampling locations by construction.On the other side,competitor geostatistical methods perfectly match the categorical target variable's observed values at sampling locations by essence.In many geoscience applications,it is often desirable to perfectly match the observed values of the categorical target variable at sampling locations,especially when the categorical target variable's measurements can be reasonably considered error-free.This paper addresses the problem of exact conditioning of machine learning methods for the spatial prediction of categorical variables.It introduces a classification random forest-based approach in which the categorical target variable is exactly conditioned to the data,thus having the exact conditioning property like competitor geostatistical methods.The proposed method extends a previous work dedicated to continuous target variables by using an implicit representation of the categorical target variable.The basic idea consists of transforming the ensemble of classification tree predictors'(categorical)resulting from the traditional classification random forest into an ensemble of signed distances(continuous)associated with each category of the categorical target variable.Then,an orthogonal representation of the ensemble of signed distances is created through the principal component analysis,thus allowing to reformulate the exact conditioning problem as a system of linear inequalities on principal component scores.Then,the sampling of new principal component scores ensuring the data's exact conditioning is performed via randomized quadratic programming.The resulting conditional signed distances are turned out into an ensemble of categorical outputs,which perfectly honor the categorical target variable's observed values at sampling locations.Then,the majority vote is used to aggregate the ensemble of categorical outputs.The effectiveness of the proposed method is illustrated on a simulated dataset for which ground-truth is available and showcased on a real-world dataset,including geochemical data.A comparison with geostatistical and traditional machine learning methods show that the proposed technique can perfectly match the categorical target variable's observed values at sampling locations while maintaining competitive out-of-sample predictive performance.展开更多
Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The d...Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The detection of fire-scars has critical importance to help decrease losses.In the present study, forest fires recorded in Antalya, one of the most important ecological and tourist regions within the Western Mediterranean, were clustered and mapped. Since the dominant factors and devastation records derived from the cases had nominal-scaled properties, a categorical databased nonparametric clustering algorithm was performed in this evaluation. The proposed tool, k-modes algorithm,uses modes instead of means for clustering. The algorithm may be implemented quickly and does not make distributional assumptions concerning the available data. It uses a frequency-based method to update the modes of the fires.The derived modes from the maps may be useful information for local authorities to manage. In conclusion, the proposed nonparametric clustering procedure may be employed to build a decision-support system to monitor and identify fire activities and to enhance fire management efficiency.展开更多
Spatial modeling of ore grades is frequently impacted by the local variation in geological domains such as lithological characteristics,rock types,and geological formations.Disregarding this information may lead to bi...Spatial modeling of ore grades is frequently impacted by the local variation in geological domains such as lithological characteristics,rock types,and geological formations.Disregarding this information may lead to biased results in the final ore grade block model,subsequently impacting the downstream processes in a mining chain project.In the current practice of ore body evaluation,which is known as stochastic cascade/hierarchical geostatistical modeling,the geological domain is first characterized,and then,within the geological model,the ore grades of interest are evaluated.This practice may be unrealistic in the case when the variability in ore grade across the boundary is gradual,following a smooth transition.To reproduce such characteristics,the cross dependence that exists between the ore grade and geological formations is considered in the conventional joint simulation between continuous and categorical variables.However,when using this approach,only one ore variable is considered,and its relationship with other ore grades that may be available at the sample location is ignored.In this study,an alternative approach to jointly model two cross-correlated ore grades and one categorical variable(i.e.,geological domains)with soft contact relationships that exist among the geological domains is proposed.The statistical and geostatistical tools are provided for variogram inference,Gibbs sampling,and conditional cosimulation.The algorithm is also tested by applying it to a Cu deposit,where the geological formations are managed by the local and spatial distribution of two cross-correlated ore grades,Cu and Au,throughout the deposit.The results show that the proposed algorithm outperforms other geostatistical techniques in terms of global and local reproduction of statistical parameters.展开更多
文摘Machine learning methods are increasingly used for spatially predicting a categorical target variable when spatially exhaustive predictor variables are available within the study region.Even though these methods exhibit competitive spatial prediction performance,they do not exactly honor the categorical target variable's observed values at sampling locations by construction.On the other side,competitor geostatistical methods perfectly match the categorical target variable's observed values at sampling locations by essence.In many geoscience applications,it is often desirable to perfectly match the observed values of the categorical target variable at sampling locations,especially when the categorical target variable's measurements can be reasonably considered error-free.This paper addresses the problem of exact conditioning of machine learning methods for the spatial prediction of categorical variables.It introduces a classification random forest-based approach in which the categorical target variable is exactly conditioned to the data,thus having the exact conditioning property like competitor geostatistical methods.The proposed method extends a previous work dedicated to continuous target variables by using an implicit representation of the categorical target variable.The basic idea consists of transforming the ensemble of classification tree predictors'(categorical)resulting from the traditional classification random forest into an ensemble of signed distances(continuous)associated with each category of the categorical target variable.Then,an orthogonal representation of the ensemble of signed distances is created through the principal component analysis,thus allowing to reformulate the exact conditioning problem as a system of linear inequalities on principal component scores.Then,the sampling of new principal component scores ensuring the data's exact conditioning is performed via randomized quadratic programming.The resulting conditional signed distances are turned out into an ensemble of categorical outputs,which perfectly honor the categorical target variable's observed values at sampling locations.Then,the majority vote is used to aggregate the ensemble of categorical outputs.The effectiveness of the proposed method is illustrated on a simulated dataset for which ground-truth is available and showcased on a real-world dataset,including geochemical data.A comparison with geostatistical and traditional machine learning methods show that the proposed technique can perfectly match the categorical target variable's observed values at sampling locations while maintaining competitive out-of-sample predictive performance.
文摘Fires have a noteworthy role to play with regards to ecological and environmental losses in Mediterranean forests. In addition to ecological impacts, fire may create economic, social as well as cultural changes. The detection of fire-scars has critical importance to help decrease losses.In the present study, forest fires recorded in Antalya, one of the most important ecological and tourist regions within the Western Mediterranean, were clustered and mapped. Since the dominant factors and devastation records derived from the cases had nominal-scaled properties, a categorical databased nonparametric clustering algorithm was performed in this evaluation. The proposed tool, k-modes algorithm,uses modes instead of means for clustering. The algorithm may be implemented quickly and does not make distributional assumptions concerning the available data. It uses a frequency-based method to update the modes of the fires.The derived modes from the maps may be useful information for local authorities to manage. In conclusion, the proposed nonparametric clustering procedure may be employed to build a decision-support system to monitor and identify fire activities and to enhance fire management efficiency.
基金The first author is thankful to Nazarbayev University for funding this work via“Faculty Development Competitive Research Grants for 2018-2020 under Contract No.090118FD5336 and 2021-2023 under Contract No.021220FD4951”This work is supported by Faculty Development Competitive Research Grants for 2018-2020 under Contract No.090118FD5336 and 2021-2023 under Contract No.021220FD4951.
文摘Spatial modeling of ore grades is frequently impacted by the local variation in geological domains such as lithological characteristics,rock types,and geological formations.Disregarding this information may lead to biased results in the final ore grade block model,subsequently impacting the downstream processes in a mining chain project.In the current practice of ore body evaluation,which is known as stochastic cascade/hierarchical geostatistical modeling,the geological domain is first characterized,and then,within the geological model,the ore grades of interest are evaluated.This practice may be unrealistic in the case when the variability in ore grade across the boundary is gradual,following a smooth transition.To reproduce such characteristics,the cross dependence that exists between the ore grade and geological formations is considered in the conventional joint simulation between continuous and categorical variables.However,when using this approach,only one ore variable is considered,and its relationship with other ore grades that may be available at the sample location is ignored.In this study,an alternative approach to jointly model two cross-correlated ore grades and one categorical variable(i.e.,geological domains)with soft contact relationships that exist among the geological domains is proposed.The statistical and geostatistical tools are provided for variogram inference,Gibbs sampling,and conditional cosimulation.The algorithm is also tested by applying it to a Cu deposit,where the geological formations are managed by the local and spatial distribution of two cross-correlated ore grades,Cu and Au,throughout the deposit.The results show that the proposed algorithm outperforms other geostatistical techniques in terms of global and local reproduction of statistical parameters.