Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or ...Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or complex distribution. Isolation forest is an outlier detection approach that explicitly isolates anomaly samples rather than models the population distribution. It can extract multivariate anomalies from huge-sized high-dimensional data with unknown population distribution. For this reason,we tentatively applied the method to identify multivariate anomalies from the stream sediment survey data of the Lalingzaohuo district,an area with a complex geological setting,in Qinghai Province in China. The performance of the isolation forest algorithm in anomaly identification was compared with that of a continuous restricted Boltzmann machine. The results show that the isolation forest model performs superiorly to the continuous restricted Boltzmann machine in multivariate anomaly identification in terms of receiver operating characteristic curve,area under the curve,and data-processing efficiency. The anomalies identified by the isolation forest model occupy 19% of the study area and contain 82% of the known mineral deposits,whereas the anomalies identified by the continuous restricted Boltzmann machine occupy 35% of the study area and contain 88% of the known mineral deposits. It takes 4. 07 and 279. 36 seconds respectively handling the dataset using the two models. Therefore,isolation forest is a useful anomaly detection method that can quickly extract multivariate anomalies from geochemical exploration data.展开更多
Model performance assessment is a key procedure for mineral potential mapping, but the correspond-ing research achievements are seldom reported in literature. Cumulative gain and lift charts are well known in the data...Model performance assessment is a key procedure for mineral potential mapping, but the correspond-ing research achievements are seldom reported in literature. Cumulative gain and lift charts are well known in the data mining community specialized in marketing and sales applications and widely used in customer chum prediction for model performance assessment. In this paper, they are introduced into the field of mineral poten-tial mapping for model performance assessment. These two charts can be viewed as a graphic representation of the advantage of using a predictive model to choose mineral targets. A cumulative gain curve can represent how much a predictive model is superior to a random guess in mineral target prediction. A lift chart can express how much more likely the mineral targets predicted by a model are deposit-bearing ones than those by a random se-lection. As an illustration, the cumulative gain and lift charts are applied to measure the performance of weights of evidence, logistic regression,restricted Boltzmann machine, and multilayer perceptron in mineral potential mapping in the Altay district in northern Xinjiang in China. The results show that the cumulative gain and lift charts can visually reveal that the first three models perform well while the last one performs poorly. Thus, the cumulative gain and lift charts can serve as a graphic tool for model performance assessment in mineral potential mapping.展开更多
基金Supported by projects of the National Natural Science Foundation of China(Nos.41272360,41472299,41672322)
文摘Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or complex distribution. Isolation forest is an outlier detection approach that explicitly isolates anomaly samples rather than models the population distribution. It can extract multivariate anomalies from huge-sized high-dimensional data with unknown population distribution. For this reason,we tentatively applied the method to identify multivariate anomalies from the stream sediment survey data of the Lalingzaohuo district,an area with a complex geological setting,in Qinghai Province in China. The performance of the isolation forest algorithm in anomaly identification was compared with that of a continuous restricted Boltzmann machine. The results show that the isolation forest model performs superiorly to the continuous restricted Boltzmann machine in multivariate anomaly identification in terms of receiver operating characteristic curve,area under the curve,and data-processing efficiency. The anomalies identified by the isolation forest model occupy 19% of the study area and contain 82% of the known mineral deposits,whereas the anomalies identified by the continuous restricted Boltzmann machine occupy 35% of the study area and contain 88% of the known mineral deposits. It takes 4. 07 and 279. 36 seconds respectively handling the dataset using the two models. Therefore,isolation forest is a useful anomaly detection method that can quickly extract multivariate anomalies from geochemical exploration data.
基金Supported by Project of the National Natural Science Foundation of China(Nos.41272360,41472299,61133011)
文摘Model performance assessment is a key procedure for mineral potential mapping, but the correspond-ing research achievements are seldom reported in literature. Cumulative gain and lift charts are well known in the data mining community specialized in marketing and sales applications and widely used in customer chum prediction for model performance assessment. In this paper, they are introduced into the field of mineral poten-tial mapping for model performance assessment. These two charts can be viewed as a graphic representation of the advantage of using a predictive model to choose mineral targets. A cumulative gain curve can represent how much a predictive model is superior to a random guess in mineral target prediction. A lift chart can express how much more likely the mineral targets predicted by a model are deposit-bearing ones than those by a random se-lection. As an illustration, the cumulative gain and lift charts are applied to measure the performance of weights of evidence, logistic regression,restricted Boltzmann machine, and multilayer perceptron in mineral potential mapping in the Altay district in northern Xinjiang in China. The results show that the cumulative gain and lift charts can visually reveal that the first three models perform well while the last one performs poorly. Thus, the cumulative gain and lift charts can serve as a graphic tool for model performance assessment in mineral potential mapping.