Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face ...Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.展开更多
Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous r...Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.展开更多
The random forest algorithm was applied to study the nuclear binding energy and charge radius.The regularized root-mean-square of error(RMSE)was proposed to avoid overfitting during the training of random forest.RMSE ...The random forest algorithm was applied to study the nuclear binding energy and charge radius.The regularized root-mean-square of error(RMSE)was proposed to avoid overfitting during the training of random forest.RMSE for nuclides with Z,N>7 is reduced to 0.816 MeV and 0.0200 fm compared with the six-term liquid drop model and a three-term nuclear charge radius formula,respectively.Specific interest is in the possible(sub)shells among the superheavy region,which is important for searching for new elements and the island of stability.The significance of shell features estimated by the so-called shapely additive explanation method suggests(Z,N)=(92,142)and(98,156)as possible subshells indicated by the binding energy.Because the present observed data is far from the N=184 shell,which is suggested by mean-field investigations,its shell effect is not predicted based on present training.The significance analysis of the nuclear charge radius suggests Z=92 and N=136 as possible subshells.The effect is verified by the shell-corrected nuclear charge radius model.展开更多
Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medi...Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.展开更多
Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth diff...Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.展开更多
Spontaneous combustion of coal increases the temperature in adjoining overburden strata of coal seams and poses a challenge when loading blastholes.This condition,known as hot-hole blasting,is dangerous due to the inc...Spontaneous combustion of coal increases the temperature in adjoining overburden strata of coal seams and poses a challenge when loading blastholes.This condition,known as hot-hole blasting,is dangerous due to the increased possibility of premature explosions in loaded blastholes.Thus,it is crucial to load the blastholes with an appropriate amount of explosives within a short period to avoid premature detonation caused by high temperatures of blastholes.Additionally,it will help achieve the desired fragment size.This study tried to ascertain the most influencial variables of mean fragment size and their optimum values adopted for blasting in a fiery seam.Data on blast design,rock mass,and fragmentation of 100 blasts in fiery seams of a coal mine were collected and used to develop mean fragmentation prediction models using soft computational techniques.The coefficient of determination(R^(2)),root mean square error(RMSE),mean absolute error(MAE),mean square error(MSE),variance account for(VAF)and coefficient of efficiency in percentage(CE)were calculated to validate the results.It indicates that the random forest algorithm(RFA)outperforms the artificial neural network(ANN),response surface method(RSM),and decision tree(DT).The values of R^(2),RMSE,MAE,MSE,VAF,and CE for RFA are 0.94,0.034,0.027,0.001,93.58,and 93.01,respectively.Multiple parametric sensitivity analyses(MPSAs)of the input variables showed that the Schmidt hammer rebound number and spacing-to-burden ratio are the most influencial variables for the blast fragment size.The analysis was finally used to define the best blast design variables to achieve optimum fragment size from blasting.The optimum factor values for RFA of S/B,ld/B and ls/ld are 1.03,1.85 and 0.7,respectively.展开更多
Multiscalar topography influence on soil distribution has a complex pattern that is related to overlay of pedological processes which occurred at different times, and these driving forces are correlated with many geom...Multiscalar topography influence on soil distribution has a complex pattern that is related to overlay of pedological processes which occurred at different times, and these driving forces are correlated with many geomorphologic scales. In this sense, the present study tested the hypothesis whether multiscale geomorphometric generalized covariables can improve pedometric modeling. To achieve this goal, this case study applied the Random Forest algorithm to a multiscale geomorphometric database to predict soil surface attributes. The study area is in phanerozoic sedimentary basins, in the Alter do Ch<span style="white-space:nowrap;">ã</span>o geological formation, Eastern Amazon, Brazil. The multiscale geomorphometric generalization was applied at general and specific geomorphometric covariables, producing groups for each scale combination. The modeling was run using Random Forest for A-horizon thickness, pH, silt and sand content. For model evaluation, visual analysis of digital maps, metrics of forest structures and effect of variables on prediction were used. For evaluation of soil textural classifications, the confusion matrix with a Kappa index, and the user’s and producer’s accuracies were employed. The geomorphometry generalization tends to smooth curvatures and produces identifiable geomorphic representations at sub-watershed and watershed levels. The forest structures and effect of variables on prediction are in agreement with pedological knowledge. The multiscale geomorphometric generalized covariables improved accuracy metrics of soil surface texture classification, with the Kappa Index going from 43% to 62%. Therefore, it can be argued that topography influences soil distribution at combined coarser spatial scales and is able to predict soil particle size contents in the studied watershed. Future development of the multiscale geomorphometric generalization framework could include generalization methods concerning preservation of features, landform classification adaptable at multiple scales.展开更多
The accurate identification of the oil-paper insulation state of a transformer is crucial for most maintenance strategies.This paper presents a multi-feature comprehensive evaluation model based on combination weighti...The accurate identification of the oil-paper insulation state of a transformer is crucial for most maintenance strategies.This paper presents a multi-feature comprehensive evaluation model based on combination weighting and an improved technique for order of preference by similarity to ideal solution(TOPSIS)method to perform an objective and scientific evaluation of the transformer oil-paper insulation state.Firstly,multiple aging features are extracted from the recovery voltage polarization spectrum and the extended Debye equivalent circuit owing to the limitations of using a single feature for evaluation.A standard evaluation index system is then established by using the collected time-domain dielectric spectrum data.Secondly,this study implements the per-unit value concept to integrate the dimension of the index matrix and calculates the objective weight by using the random forest algorithm.Furthermore,it combines the weighting model to overcome the drawbacks of the single weighting method by using the indicators and considering the subjective experience of experts and the random forest algorithm.Lastly,the enhanced TOPSIS approach is used to determine the insulation quality of an oil-paper transformer.A verification example demonstrates that the evaluation model developed in this study can efficiently and accurately diagnose the insulation status of transformers.Essentially,this study presents a novel approach for the assessment of transformer oil-paper insulation.展开更多
The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. ...The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P 〈 0.001), as well as in all transrectal ultrasound characteristics (P 〈 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.展开更多
This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algori...This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.展开更多
The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this ...The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this paper proposes an unbalanced accuracy weighted random forest algorithm(UAW_RF)based on the adaptive step size artificial bee colony optimization.It combines the ideas of decision tree optimization,sampling selection,and weighted voting to improve the ability of stochastic forest algorithm when dealing with biased data classification.The adaptive step size and the optimal solution were introduced to improve the position updating formula of the artificial bee colony algorithm,and then the parameter combination of the random forest algorithm was iteratively optimized with the advantages of the algorithm.Experimental results show satisfactory accuracies and prove that the method can effectively improve the classification accuracy of the random forest algorithm.展开更多
Wide-field-of-view(WFV) imager that observes the earth environment with four solar reflective bands in a spatial resolution of 16 m is equipped on board Gaofen-1(GF-1) satellite. Chlorophyll-a(Chl-a) concentration in ...Wide-field-of-view(WFV) imager that observes the earth environment with four solar reflective bands in a spatial resolution of 16 m is equipped on board Gaofen-1(GF-1) satellite. Chlorophyll-a(Chl-a) concentration in Lake Taihu, China from 2018 to 2019 is collected and collocated with GF-1 satellite data. This study develops a general and reliable estimation of Chl-a concentration from GF-1 WFV data under turbid inland water conditions. The collocated data are classified according to season and used in random forest(RF) regression to train models for retrieving the lake Chl-a concentration. A composite index is developed to select the most important variables in the models. The models trained for each season show a better performance than the model trained by using the whole year data in terms of the coefficient of determination(R^(2)) between retrievals and observations. Specifically, the R2 values in spring, summer, autumn, and winter are 0.88, 0.88, 0.94, and 0.74, respectively;whereas that using the whole year data is only 0.71. The Chl-a concentration in Lake Taihu exhibits an obvious seasonal change with the highest in summer, followed by autumn and spring, and the lowest in winter. The Chl-a concentration also displays an obvious spatial variation with season. A high concentration occurs mainly in the northwest of the lake. The temporal and spatial changes of Chl-a concentration are almost consistent with the changes in the areas and times of cyanobacteria blooms based on Moderate Resolution Imaging Spectroradiometer(MODIS) data. The proposed algorithm can be operated without a priori knowledge on atmospheric conditions and water quality. Our study also demonstrates that GF-1 data are increasingly valuable for monitoring the Chl-a concentration of inland water bodies in China at a high spatial resolution.展开更多
The Ms8.0 Wenchuan earthquake of 2008 dramatically changed the terrain surface and caused long-term increases in the scale and frequency of landslides and debris flows.The changing trend of landslides in the earthquak...The Ms8.0 Wenchuan earthquake of 2008 dramatically changed the terrain surface and caused long-term increases in the scale and frequency of landslides and debris flows.The changing trend of landslides in the earthquake-affected area over the decade since the earthquake remains largely unknown.In this study,we were able to address this issue using supervised classification methods and multitemporal remote sensing images to study landslide evolution in the worst-affected area(Mianyuan River Basin)over a period of ten years.Satellite images were processed using the maximum likelihood method and random forest algorithm to automatically map landslide occurrence from 2007 to 2018.The principal findings are as follows:(1)when compared with visual image analysis,the random forest algorithm had a good average accuracy rate of 87%for landslide identification;(2)postevent landslide occurrence has generally decreased with time,but heavy monsoonal seasons have caused temporary spikes in activity;and(3)the postearthquake landslide activity in the Mianyuan River Basin can be divided into a strong activity period(2008 to 2011),medium activity period(2012 to 2016),and weak activity period(post 2017).Landslide activity remains above the prequake level,with damaging events being rare but continuing to occur.Long-term remote sensing and on-site monitoring are required to understand the evolution of landslide activity after strong earthquakes.展开更多
Understanding the impact of climate change on vegetation and its evolution trend requires long-term accurate data on regional vegetation types and their geographical distribution.Currently,land use and land cover type...Understanding the impact of climate change on vegetation and its evolution trend requires long-term accurate data on regional vegetation types and their geographical distribution.Currently,land use and land cover types are mainly obtained based on remote sensing information.Little research has been conducted on remote sensing interpretation of vegetation types and their geographical distributions in terms of the comprehensive utilization of remote sensing,climate,and terrain.A new region vegetation mapping method based on terrain-climate-remote sensing was developed in this study,supported by the Google Earth Engine(GEE)and the random forest algorithm,which is a new generation of earth science data and analysis application platform,together with optimal vegetation mapping features obtained from the average impure reduction method and out-of-bag error value,using different information from remote sensing,climate,and terrain.This vegetation of Qinghai-Xizang Plateau with 10 m spatial resolution in 2020 was mapped,in terms of this new vegetation mapping method,Sentinel-2A/B remotely sensed images,climate,and terrain.The accuracy verification of vegetation mapping on the Qinghai-Xizang Plateau showed an overall accuracy of 89.5%and a Kappa coefficient of 0.87.The results suggest that the regional vegetation mapping method based on terrain-climate-remote sensing proposed in this study can provide technical support for obtaining long-term accurate data on vegetation types and their geographical distributions on the Qinghai-Xizang Plateau and the globe.展开更多
Historical biome changes on the Tibetan Plateau provide important information that improves our understanding of the alpine vegetation responses to climate changes.However,a comprehensively quantitative reconstruction...Historical biome changes on the Tibetan Plateau provide important information that improves our understanding of the alpine vegetation responses to climate changes.However,a comprehensively quantitative reconstruction of the historical Tibetan Plateau biomes is not possible due to the lack of quantitative methods that enable appropriate classification of alpine biomes based on proxy data such as fossil pollen records.In this study,a pollen-based biome classification model was developed by applying a random forest algorithm(a supervised machine learning method)based on modern pollen assemblages on and around the Tibetan Plateau,and its robustness was assessed by comparing its results with the predictions of the biomisation method.The results indicated that modern biome distributions reconstructed using the random forest model based on modern pollen data generally concurred with the observed zonal vegetation.The random forest model had a significantly higher accuracy than the biomisation method,indicating the former is a more suitable tool for reconstructing alpine biome changes on the Tibetan Plateau.The random forest model was then applied to reconstruct the Tibetan Plateau biome changes from 22 ka BP to the present based on 51 fossil pollen records.The reconstructed biome distribution changes on the Tibetan Plateau generally corresponded to global climate changes and Asian monsoon variations.In the Last Glacial Maximum,the Tibetan Plateau was mainly desert with subtropical forests distributed in the southeast.During the last deglaciation,the alpine steppe began expanding and gradually became zonal vegetation in the central and eastern regions.Alpine meadow occupied the eastern and southeastern areas of the Tibetan Plateau since the early Holocene,and the forest-meadow-steppe-desert pattern running southeast to northwest on the Tibetan Plateau was established afterwards.In the mid-Holocene,subtropical forests extended north,which reflected the“optimum”condition.During the late Holocene,alpine meadows and alpine steppes expanded south.展开更多
Recent years have witnessed a continuous discovering of new thermoelectric materials which has experienced a paradigm shift from try-and-error efforts to experience-based discovering and first-principles calculation. ...Recent years have witnessed a continuous discovering of new thermoelectric materials which has experienced a paradigm shift from try-and-error efforts to experience-based discovering and first-principles calculation. However, both the experiment and first-principles calculation deriving routes to determine a new compound are time and resources consuming. Here, we demonstrated a machine learning approach to discover new M_(2)X_(3)-type thermoelectric materials with only the composition information. According to the classic Bi_(2)Te_(3) material, we constructed an M_(2)X_(3)-type thermoelectric material library with 720 compounds by using isoelectronic substitution, in which only 101 compounds have crystalline structure information in the Inorganic Crystal Structure Database(ICSD) and Materials Project(MP) database. A model based on the random forest(RF) algorithm plus Bayesian optimization was used to explore the underlying principles to determine the crystal structures from the known compounds. The physical properties of constituent elements(such as atomic mass, electronegativity, ionic radius) were used to define the feature of the compounds with a general formula ^(1)M^(2)M^(1)X^(2)X^(3)X(^(1)M +^(2)M:^(1)X +^(2)X+^(3)X = 2:3). The primary goal is to find new thermoelectric materials with the same rhombohedral structure as Bi_(2)Te_(3) by machine learning.The final trained RF model showed a high accuracy of 91% on the prediction of rhombohedral compounds. Finally, we selected four important features to proceed with the polynomial fitting with the prediction results from the RF model and used the acquired polynomial function to make further discoveries outside the pre-defined material library.展开更多
Soil-borne plant diseases cause major economic losses globally.This is partly because their epidemiology is difficult to predict in agricultural fields,where multiple environmental factors could determine disease outc...Soil-borne plant diseases cause major economic losses globally.This is partly because their epidemiology is difficult to predict in agricultural fields,where multiple environmental factors could determine disease outcomes.Here we used a combination of field sampling and direct experimentation to identify key abiotic and biotic soil properties that can predict the occurrence of bacterial wilt caused by pathogenic Ralstonia solanacearum.By analyzing 139 tomato rhizosphere soils samples isolated from six provinces in China,we first show a clear link between soil properties,pathogen density and plant health.Specifically,disease outcomes were positively associated with soil moisture,bacterial abundance and bacterial community composition.Based on soil properties alone,random forest machine learning algorithm could predict disease outcomes correctly in 75%of cases with soil moisture being the most significant predictor.The importance of soil moisture was validated causally in a controlled greenhouse experiment,where the highest disease incidence was observed at 60%of maximum water holding capacity.Together,our results show that local soil properties can predict disease occurrence across a wider agricultural landscape,and that management of soil moisture could potentially offer a straightforward method for reducing crop losses to R.solanacearum.展开更多
Characteristic of iron ore is the essential factor of granulating.Three ores,namely specularite,magnetite concentrate and limonite,were selected as adhesion powder to investigate granulating behavior and evolution pro...Characteristic of iron ore is the essential factor of granulating.Three ores,namely specularite,magnetite concentrate and limonite,were selected as adhesion powder to investigate granulating behavior and evolution process of agglomeration.Experiments and modeling were performed to represent granulating behavior on the basis of selectivity,ballability and adhesion rate.The mass fraction of water and particles size of adhesion and nucleation were set at(11±1)%,0-1 mm and 3-5 mm,respectively.Experimental results show that selectivity and ballability promote the evolution of granulation.The water absorption rate of specularite and the ballability of limonite are better.The coupling effects exist in two ores mixing and present positive effect when the proportion of magnetite concentrate is greater than that of specularite or specularite and limonite blend.During three ores mixing,the coupling effect presents a complex superposition state.A characterization model of adhesion rate of mixing granulation was established by random forest algorithms.Its output is adhesion rate,and its inputs include water absorption rate,balling index and mixing proportion.The model parameters are 957 trees and four branches,and the training and prediction errors of the model are 2.3%and 3.7%,respectively.Modeling indicates that the random forest model can be used to represent coupling effects of mixing granulation.展开更多
The extensive use of greenhouses has brought soared economic benefits for farming practitioners in China and an overview of the spatio-temporal distribution of greenhouses is of great interest to agricultural practiti...The extensive use of greenhouses has brought soared economic benefits for farming practitioners in China and an overview of the spatio-temporal distribution of greenhouses is of great interest to agricultural practitioners and decision-makers.In this study,Landsat image based greenhouse maps in Guanzhong Plain,Shaanxi,China were made using random forest classification algorithm through visual interpretation on the Google Earth Engine.The 7-year's changes in greenhouse areas were investigated(i.e.2000,2003,2006,2010,2013,2015 and 2019)with yearly overall accuracy more than 90%.The results showed that the total area of greenhouses in Guanzhong Plain demonstrated an increasing trend,from 5.92 km2 in 2000 to 194.42 km2 in 2019 with a considerable growth between 2010 and 2015.The dominant drivers for the increase are largely attributed to the government policy as well as economic profitability.The distribution of greenhouse shifts to central and eastern regions of Guanzhong Plain.Greenhouses preferentially expand to the area near to rural roads,main rivers,and high elevation,with more than 45%greenhouses distributed within 1 km of the county rural road.The principal component analysis based suitability evaluation showed that a total of 38.44%of the area was suitable for greenhouse.展开更多
A material's electronic properties and technological utility depend on its band gap value and the nature of band gap(i.e.direct or indirect).This nature of band gaps is notoriously difficult to compute from first ...A material's electronic properties and technological utility depend on its band gap value and the nature of band gap(i.e.direct or indirect).This nature of band gaps is notoriously difficult to compute from first principles.In fact it is computationally intense to approximate and also rather time consuming.Hence its prediction represents a challenging problem.Machine learning based approach offers a promising and computationally efficient means to address this problem.Here we predict the nature of band gap for perovskite oxides(ABO_(3))with elemental composition,ionic radius,ionic character and electronegativity.We do this by training machine learning models on computationally generated datasets.Knowing the nature of the band gap of the perovskite oxides(whether direct or indirect)plays a pivotal role in determining whether the perovskite can be used for photovoltaic or photocatalytic applications.A total of 5329 perovskite oxides are considered in this study.Here,we determine the correlation between the nature of band gap and the composition of the perovskite oxide.A Random Forest algorithm is used for predicting the same since it yielded higher accuracy(~91%)compared to the other Machine Learning models.The approach suggested here can be used to predict the nature of bandgap and can also aid in novel materials discovery within the family of perovskites.This is a robust,quick,and low-cost strategy to find novel materials for light harvesting applications in particular.Also we present feature ranking as it pertains to prediction of nature of bandgap and also discuss correlation between the features.We also show feature importance graphs and SHapley Additive exPlanations(SHAP)as is relevant for prediction of nature of band gaps.Using the approach reported,NaPuO_(3) and VPbO_(3) are discovered to be good candidates for solar cell materials(direct band gap~1.5 eV).Novel composition predictions for targeted applications are the future and our model is a step ahead in this direction.展开更多
基金financially supported by the National Natural Science Foundation of China(No.52174001)the National Natural Science Foundation of China(No.52004064)+1 种基金the Hainan Province Science and Technology Special Fund “Research on Real-time Intelligent Sensing Technology for Closed-loop Drilling of Oil and Gas Reservoirs in Deepwater Drilling”(ZDYF2023GXJS012)Heilongjiang Provincial Government and Daqing Oilfield's first batch of the scientific and technological key project “Research on the Construction Technology of Gulong Shale Oil Big Data Analysis System”(DQYT-2022-JS-750)。
文摘Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.
基金Under the auspices of National Natural Science Foundation of China(No.52079103)。
文摘Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.
基金Supported by Basic and Applied Basic Research Project of Guangdong Province(2021B0301030006)。
文摘The random forest algorithm was applied to study the nuclear binding energy and charge radius.The regularized root-mean-square of error(RMSE)was proposed to avoid overfitting during the training of random forest.RMSE for nuclides with Z,N>7 is reduced to 0.816 MeV and 0.0200 fm compared with the six-term liquid drop model and a three-term nuclear charge radius formula,respectively.Specific interest is in the possible(sub)shells among the superheavy region,which is important for searching for new elements and the island of stability.The significance of shell features estimated by the so-called shapely additive explanation method suggests(Z,N)=(92,142)and(98,156)as possible subshells indicated by the binding energy.Because the present observed data is far from the N=184 shell,which is suggested by mean-field investigations,its shell effect is not predicted based on present training.The significance analysis of the nuclear charge radius suggests Z=92 and N=136 as possible subshells.The effect is verified by the shell-corrected nuclear charge radius model.
文摘Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.
基金supported by the Major Program of the National Natural Science Foundation of China(No.32192434)the Fundamental Research Funds of Chinese Academy of Forestry(No.CAFYBB2019ZD001)the National Key Research and Development Program of China(2016YFD060020602).
文摘Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.
文摘Spontaneous combustion of coal increases the temperature in adjoining overburden strata of coal seams and poses a challenge when loading blastholes.This condition,known as hot-hole blasting,is dangerous due to the increased possibility of premature explosions in loaded blastholes.Thus,it is crucial to load the blastholes with an appropriate amount of explosives within a short period to avoid premature detonation caused by high temperatures of blastholes.Additionally,it will help achieve the desired fragment size.This study tried to ascertain the most influencial variables of mean fragment size and their optimum values adopted for blasting in a fiery seam.Data on blast design,rock mass,and fragmentation of 100 blasts in fiery seams of a coal mine were collected and used to develop mean fragmentation prediction models using soft computational techniques.The coefficient of determination(R^(2)),root mean square error(RMSE),mean absolute error(MAE),mean square error(MSE),variance account for(VAF)and coefficient of efficiency in percentage(CE)were calculated to validate the results.It indicates that the random forest algorithm(RFA)outperforms the artificial neural network(ANN),response surface method(RSM),and decision tree(DT).The values of R^(2),RMSE,MAE,MSE,VAF,and CE for RFA are 0.94,0.034,0.027,0.001,93.58,and 93.01,respectively.Multiple parametric sensitivity analyses(MPSAs)of the input variables showed that the Schmidt hammer rebound number and spacing-to-burden ratio are the most influencial variables for the blast fragment size.The analysis was finally used to define the best blast design variables to achieve optimum fragment size from blasting.The optimum factor values for RFA of S/B,ld/B and ls/ld are 1.03,1.85 and 0.7,respectively.
文摘Multiscalar topography influence on soil distribution has a complex pattern that is related to overlay of pedological processes which occurred at different times, and these driving forces are correlated with many geomorphologic scales. In this sense, the present study tested the hypothesis whether multiscale geomorphometric generalized covariables can improve pedometric modeling. To achieve this goal, this case study applied the Random Forest algorithm to a multiscale geomorphometric database to predict soil surface attributes. The study area is in phanerozoic sedimentary basins, in the Alter do Ch<span style="white-space:nowrap;">ã</span>o geological formation, Eastern Amazon, Brazil. The multiscale geomorphometric generalization was applied at general and specific geomorphometric covariables, producing groups for each scale combination. The modeling was run using Random Forest for A-horizon thickness, pH, silt and sand content. For model evaluation, visual analysis of digital maps, metrics of forest structures and effect of variables on prediction were used. For evaluation of soil textural classifications, the confusion matrix with a Kappa index, and the user’s and producer’s accuracies were employed. The geomorphometry generalization tends to smooth curvatures and produces identifiable geomorphic representations at sub-watershed and watershed levels. The forest structures and effect of variables on prediction are in agreement with pedological knowledge. The multiscale geomorphometric generalized covariables improved accuracy metrics of soil surface texture classification, with the Kappa Index going from 43% to 62%. Therefore, it can be argued that topography influences soil distribution at combined coarser spatial scales and is able to predict soil particle size contents in the studied watershed. Future development of the multiscale geomorphometric generalization framework could include generalization methods concerning preservation of features, landform classification adaptable at multiple scales.
基金supported by the Natural Science Foundation of the Fujian Province(2021J01109).
文摘The accurate identification of the oil-paper insulation state of a transformer is crucial for most maintenance strategies.This paper presents a multi-feature comprehensive evaluation model based on combination weighting and an improved technique for order of preference by similarity to ideal solution(TOPSIS)method to perform an objective and scientific evaluation of the transformer oil-paper insulation state.Firstly,multiple aging features are extracted from the recovery voltage polarization spectrum and the extended Debye equivalent circuit owing to the limitations of using a single feature for evaluation.A standard evaluation index system is then established by using the collected time-domain dielectric spectrum data.Secondly,this study implements the per-unit value concept to integrate the dimension of the index matrix and calculates the objective weight by using the random forest algorithm.Furthermore,it combines the weighting model to overcome the drawbacks of the single weighting method by using the indicators and considering the subjective experience of experts and the random forest algorithm.Lastly,the enhanced TOPSIS approach is used to determine the insulation quality of an oil-paper transformer.A verification example demonstrates that the evaluation model developed in this study can efficiently and accurately diagnose the insulation status of transformers.Essentially,this study presents a novel approach for the assessment of transformer oil-paper insulation.
文摘The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P 〈 0.001), as well as in all transrectal ultrasound characteristics (P 〈 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.
文摘This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.
基金the CERNET Innovation Project(No.NGII20190315)the Foundation of A Hundred Youth Talents Training Program of Lanzhou Jiaotong University.
文摘The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this paper proposes an unbalanced accuracy weighted random forest algorithm(UAW_RF)based on the adaptive step size artificial bee colony optimization.It combines the ideas of decision tree optimization,sampling selection,and weighted voting to improve the ability of stochastic forest algorithm when dealing with biased data classification.The adaptive step size and the optimal solution were introduced to improve the position updating formula of the artificial bee colony algorithm,and then the parameter combination of the random forest algorithm was iteratively optimized with the advantages of the algorithm.Experimental results show satisfactory accuracies and prove that the method can effectively improve the classification accuracy of the random forest algorithm.
基金Supported by the National Key Research and Development Program of China(2018YFC1506500)Foundation for Key Scientific Research of Jiangsu Meteorological Bureau(KZ202003)。
文摘Wide-field-of-view(WFV) imager that observes the earth environment with four solar reflective bands in a spatial resolution of 16 m is equipped on board Gaofen-1(GF-1) satellite. Chlorophyll-a(Chl-a) concentration in Lake Taihu, China from 2018 to 2019 is collected and collocated with GF-1 satellite data. This study develops a general and reliable estimation of Chl-a concentration from GF-1 WFV data under turbid inland water conditions. The collocated data are classified according to season and used in random forest(RF) regression to train models for retrieving the lake Chl-a concentration. A composite index is developed to select the most important variables in the models. The models trained for each season show a better performance than the model trained by using the whole year data in terms of the coefficient of determination(R^(2)) between retrievals and observations. Specifically, the R2 values in spring, summer, autumn, and winter are 0.88, 0.88, 0.94, and 0.74, respectively;whereas that using the whole year data is only 0.71. The Chl-a concentration in Lake Taihu exhibits an obvious seasonal change with the highest in summer, followed by autumn and spring, and the lowest in winter. The Chl-a concentration also displays an obvious spatial variation with season. A high concentration occurs mainly in the northwest of the lake. The temporal and spatial changes of Chl-a concentration are almost consistent with the changes in the areas and times of cyanobacteria blooms based on Moderate Resolution Imaging Spectroradiometer(MODIS) data. The proposed algorithm can be operated without a priori knowledge on atmospheric conditions and water quality. Our study also demonstrates that GF-1 data are increasingly valuable for monitoring the Chl-a concentration of inland water bodies in China at a high spatial resolution.
基金financially supported by the National Key R&D Program(No.2018YFC1505402)the Key Research and Development Program of Sichuan Province(No.2023YFS0435)+1 种基金the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project(No.SKLGP2014Z004)the Science and Technology Innovation Fund of Sichuan Earthquake Agency(No.201901)。
文摘The Ms8.0 Wenchuan earthquake of 2008 dramatically changed the terrain surface and caused long-term increases in the scale and frequency of landslides and debris flows.The changing trend of landslides in the earthquake-affected area over the decade since the earthquake remains largely unknown.In this study,we were able to address this issue using supervised classification methods and multitemporal remote sensing images to study landslide evolution in the worst-affected area(Mianyuan River Basin)over a period of ten years.Satellite images were processed using the maximum likelihood method and random forest algorithm to automatically map landslide occurrence from 2007 to 2018.The principal findings are as follows:(1)when compared with visual image analysis,the random forest algorithm had a good average accuracy rate of 87%for landslide identification;(2)postevent landslide occurrence has generally decreased with time,but heavy monsoonal seasons have caused temporary spikes in activity;and(3)the postearthquake landslide activity in the Mianyuan River Basin can be divided into a strong activity period(2008 to 2011),medium activity period(2012 to 2016),and weak activity period(post 2017).Landslide activity remains above the prequake level,with damaging events being rare but continuing to occur.Long-term remote sensing and on-site monitoring are required to understand the evolution of landslide activity after strong earthquakes.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2019QZKK0106)。
文摘Understanding the impact of climate change on vegetation and its evolution trend requires long-term accurate data on regional vegetation types and their geographical distribution.Currently,land use and land cover types are mainly obtained based on remote sensing information.Little research has been conducted on remote sensing interpretation of vegetation types and their geographical distributions in terms of the comprehensive utilization of remote sensing,climate,and terrain.A new region vegetation mapping method based on terrain-climate-remote sensing was developed in this study,supported by the Google Earth Engine(GEE)and the random forest algorithm,which is a new generation of earth science data and analysis application platform,together with optimal vegetation mapping features obtained from the average impure reduction method and out-of-bag error value,using different information from remote sensing,climate,and terrain.This vegetation of Qinghai-Xizang Plateau with 10 m spatial resolution in 2020 was mapped,in terms of this new vegetation mapping method,Sentinel-2A/B remotely sensed images,climate,and terrain.The accuracy verification of vegetation mapping on the Qinghai-Xizang Plateau showed an overall accuracy of 89.5%and a Kappa coefficient of 0.87.The results suggest that the regional vegetation mapping method based on terrain-climate-remote sensing proposed in this study can provide technical support for obtaining long-term accurate data on vegetation types and their geographical distributions on the Qinghai-Xizang Plateau and the globe.
基金supported by the National Natural Science Foundation of China(Grant No.41690113)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA20070101)the National Natural Science Foundation of China(Grant Nos.42071114,41977395,and 41671202)。
文摘Historical biome changes on the Tibetan Plateau provide important information that improves our understanding of the alpine vegetation responses to climate changes.However,a comprehensively quantitative reconstruction of the historical Tibetan Plateau biomes is not possible due to the lack of quantitative methods that enable appropriate classification of alpine biomes based on proxy data such as fossil pollen records.In this study,a pollen-based biome classification model was developed by applying a random forest algorithm(a supervised machine learning method)based on modern pollen assemblages on and around the Tibetan Plateau,and its robustness was assessed by comparing its results with the predictions of the biomisation method.The results indicated that modern biome distributions reconstructed using the random forest model based on modern pollen data generally concurred with the observed zonal vegetation.The random forest model had a significantly higher accuracy than the biomisation method,indicating the former is a more suitable tool for reconstructing alpine biome changes on the Tibetan Plateau.The random forest model was then applied to reconstruct the Tibetan Plateau biome changes from 22 ka BP to the present based on 51 fossil pollen records.The reconstructed biome distribution changes on the Tibetan Plateau generally corresponded to global climate changes and Asian monsoon variations.In the Last Glacial Maximum,the Tibetan Plateau was mainly desert with subtropical forests distributed in the southeast.During the last deglaciation,the alpine steppe began expanding and gradually became zonal vegetation in the central and eastern regions.Alpine meadow occupied the eastern and southeastern areas of the Tibetan Plateau since the early Holocene,and the forest-meadow-steppe-desert pattern running southeast to northwest on the Tibetan Plateau was established afterwards.In the mid-Holocene,subtropical forests extended north,which reflected the“optimum”condition.During the late Holocene,alpine meadows and alpine steppes expanded south.
基金the National Key Research and Development Program of China (No. 2018YFB0703600)Shenzhen Key Projects of Long-Term Support Plan (No. 20200925164021002)。
文摘Recent years have witnessed a continuous discovering of new thermoelectric materials which has experienced a paradigm shift from try-and-error efforts to experience-based discovering and first-principles calculation. However, both the experiment and first-principles calculation deriving routes to determine a new compound are time and resources consuming. Here, we demonstrated a machine learning approach to discover new M_(2)X_(3)-type thermoelectric materials with only the composition information. According to the classic Bi_(2)Te_(3) material, we constructed an M_(2)X_(3)-type thermoelectric material library with 720 compounds by using isoelectronic substitution, in which only 101 compounds have crystalline structure information in the Inorganic Crystal Structure Database(ICSD) and Materials Project(MP) database. A model based on the random forest(RF) algorithm plus Bayesian optimization was used to explore the underlying principles to determine the crystal structures from the known compounds. The physical properties of constituent elements(such as atomic mass, electronegativity, ionic radius) were used to define the feature of the compounds with a general formula ^(1)M^(2)M^(1)X^(2)X^(3)X(^(1)M +^(2)M:^(1)X +^(2)X+^(3)X = 2:3). The primary goal is to find new thermoelectric materials with the same rhombohedral structure as Bi_(2)Te_(3) by machine learning.The final trained RF model showed a high accuracy of 91% on the prediction of rhombohedral compounds. Finally, we selected four important features to proceed with the polynomial fitting with the prediction results from the RF model and used the acquired polynomial function to make further discoveries outside the pre-defined material library.
基金the National Natural Science Foundation of China(41922053,42090062,31972504 and 42007038)the Fundamental Research Funds for the Central Universities(KJQN202116-KJQN202117,KYXK202009-KYXK202012)+3 种基金the Natural Science Foundation of Jiangsu Province(BK20190518,BK20180527 and BK20200533)the China Postdoctoral Science Foundation(2019M651848)the Bioinformatics Center of Nanjing Agricultural University.S.G.is funded by the NWO-Veni grant(016.Veni.181.078 to S.G.).V.F.is funded by the Royal Society(RSG\R1\180213 and CHL\R1\180031)jointly by a grant from UKRI,Defra,and the Scottish Government,under the Strategic Priorities Fund Plant Bacterial Diseases programme(BB/T010606/1)at the University of York.
文摘Soil-borne plant diseases cause major economic losses globally.This is partly because their epidemiology is difficult to predict in agricultural fields,where multiple environmental factors could determine disease outcomes.Here we used a combination of field sampling and direct experimentation to identify key abiotic and biotic soil properties that can predict the occurrence of bacterial wilt caused by pathogenic Ralstonia solanacearum.By analyzing 139 tomato rhizosphere soils samples isolated from six provinces in China,we first show a clear link between soil properties,pathogen density and plant health.Specifically,disease outcomes were positively associated with soil moisture,bacterial abundance and bacterial community composition.Based on soil properties alone,random forest machine learning algorithm could predict disease outcomes correctly in 75%of cases with soil moisture being the most significant predictor.The importance of soil moisture was validated causally in a controlled greenhouse experiment,where the highest disease incidence was observed at 60%of maximum water holding capacity.Together,our results show that local soil properties can predict disease occurrence across a wider agricultural landscape,and that management of soil moisture could potentially offer a straightforward method for reducing crop losses to R.solanacearum.
基金This work was financially supported by the National Natural Science Foundation of China(No.51674042)Hunan Province 2011 Collaborative Innovation Center of Clean Energy and Smart Grid.
文摘Characteristic of iron ore is the essential factor of granulating.Three ores,namely specularite,magnetite concentrate and limonite,were selected as adhesion powder to investigate granulating behavior and evolution process of agglomeration.Experiments and modeling were performed to represent granulating behavior on the basis of selectivity,ballability and adhesion rate.The mass fraction of water and particles size of adhesion and nucleation were set at(11±1)%,0-1 mm and 3-5 mm,respectively.Experimental results show that selectivity and ballability promote the evolution of granulation.The water absorption rate of specularite and the ballability of limonite are better.The coupling effects exist in two ores mixing and present positive effect when the proportion of magnetite concentrate is greater than that of specularite or specularite and limonite blend.During three ores mixing,the coupling effect presents a complex superposition state.A characterization model of adhesion rate of mixing granulation was established by random forest algorithms.Its output is adhesion rate,and its inputs include water absorption rate,balling index and mixing proportion.The model parameters are 957 trees and four branches,and the training and prediction errors of the model are 2.3%and 3.7%,respectively.Modeling indicates that the random forest model can be used to represent coupling effects of mixing granulation.
文摘The extensive use of greenhouses has brought soared economic benefits for farming practitioners in China and an overview of the spatio-temporal distribution of greenhouses is of great interest to agricultural practitioners and decision-makers.In this study,Landsat image based greenhouse maps in Guanzhong Plain,Shaanxi,China were made using random forest classification algorithm through visual interpretation on the Google Earth Engine.The 7-year's changes in greenhouse areas were investigated(i.e.2000,2003,2006,2010,2013,2015 and 2019)with yearly overall accuracy more than 90%.The results showed that the total area of greenhouses in Guanzhong Plain demonstrated an increasing trend,from 5.92 km2 in 2000 to 194.42 km2 in 2019 with a considerable growth between 2010 and 2015.The dominant drivers for the increase are largely attributed to the government policy as well as economic profitability.The distribution of greenhouse shifts to central and eastern regions of Guanzhong Plain.Greenhouses preferentially expand to the area near to rural roads,main rivers,and high elevation,with more than 45%greenhouses distributed within 1 km of the county rural road.The principal component analysis based suitability evaluation showed that a total of 38.44%of the area was suitable for greenhouse.
基金We would also like to thank the DST Water Technology Initiative project for financial support(File No:DST/TMD-EWO/WTI/2K19/EWFH/2019/122(G))We would also like to acknowledge DST Materials for energy storage(File No:DST/TMD/MES/2K18/17)and DST Indo-Hungary project here.
文摘A material's electronic properties and technological utility depend on its band gap value and the nature of band gap(i.e.direct or indirect).This nature of band gaps is notoriously difficult to compute from first principles.In fact it is computationally intense to approximate and also rather time consuming.Hence its prediction represents a challenging problem.Machine learning based approach offers a promising and computationally efficient means to address this problem.Here we predict the nature of band gap for perovskite oxides(ABO_(3))with elemental composition,ionic radius,ionic character and electronegativity.We do this by training machine learning models on computationally generated datasets.Knowing the nature of the band gap of the perovskite oxides(whether direct or indirect)plays a pivotal role in determining whether the perovskite can be used for photovoltaic or photocatalytic applications.A total of 5329 perovskite oxides are considered in this study.Here,we determine the correlation between the nature of band gap and the composition of the perovskite oxide.A Random Forest algorithm is used for predicting the same since it yielded higher accuracy(~91%)compared to the other Machine Learning models.The approach suggested here can be used to predict the nature of bandgap and can also aid in novel materials discovery within the family of perovskites.This is a robust,quick,and low-cost strategy to find novel materials for light harvesting applications in particular.Also we present feature ranking as it pertains to prediction of nature of bandgap and also discuss correlation between the features.We also show feature importance graphs and SHapley Additive exPlanations(SHAP)as is relevant for prediction of nature of band gaps.Using the approach reported,NaPuO_(3) and VPbO_(3) are discovered to be good candidates for solar cell materials(direct band gap~1.5 eV).Novel composition predictions for targeted applications are the future and our model is a step ahead in this direction.