Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face ...Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.展开更多
Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous r...Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.展开更多
The exploration of urban underground spaces is of great significance to urban planning,geological disaster prevention,resource exploration and environmental monitoring.However,due to the existing of severe interferenc...The exploration of urban underground spaces is of great significance to urban planning,geological disaster prevention,resource exploration and environmental monitoring.However,due to the existing of severe interferences,conventional seismic methods cannot adapt to the complex urban environment well.Since adopting the single-node data acquisition method and taking the seismic ambient noise as the signal,the microtremor horizontal-to-vertical spectral ratio(HVSR)method can effectively avoid the strong interference problems caused by the complex urban environment,which could obtain information such as S-wave velocity and thickness of underground formations by fitting the microtremor HVSR curve.Nevertheless,HVSR curve inversion is a multi-parameter curve fitting process.And conventional inversion methods can easily converge to the local minimum,which will directly affect the reliability of the inversion results.Thus,the authors propose a HVSR inversion method based on the multimodal forest optimization algorithm,which uses the efficient clustering technique and locates the global optimum quickly.Tests on synthetic data show that the inversion results of the proposed method are consistent with the forward model.Both the adaption and stability to the abnormal layer velocity model are demonstrated.The results of the real field data are also verified by the drilling information.展开更多
The random forest algorithm was applied to study the nuclear binding energy and charge radius.The regularized root-mean-square of error(RMSE)was proposed to avoid overfitting during the training of random forest.RMSE ...The random forest algorithm was applied to study the nuclear binding energy and charge radius.The regularized root-mean-square of error(RMSE)was proposed to avoid overfitting during the training of random forest.RMSE for nuclides with Z,N>7 is reduced to 0.816 MeV and 0.0200 fm compared with the six-term liquid drop model and a three-term nuclear charge radius formula,respectively.Specific interest is in the possible(sub)shells among the superheavy region,which is important for searching for new elements and the island of stability.The significance of shell features estimated by the so-called shapely additive explanation method suggests(Z,N)=(92,142)and(98,156)as possible subshells indicated by the binding energy.Because the present observed data is far from the N=184 shell,which is suggested by mean-field investigations,its shell effect is not predicted based on present training.The significance analysis of the nuclear charge radius suggests Z=92 and N=136 as possible subshells.The effect is verified by the shell-corrected nuclear charge radius model.展开更多
Widely used deep neural networks currently face limitations in achieving optimal performance for purchase intention prediction due to constraints on data volume and hyperparameter selection.To address this issue,based...Widely used deep neural networks currently face limitations in achieving optimal performance for purchase intention prediction due to constraints on data volume and hyperparameter selection.To address this issue,based on the deep forest algorithm and further integrating evolutionary ensemble learning methods,this paper proposes a novel Deep Adaptive Evolutionary Ensemble(DAEE)model.This model introduces model diversity into the cascade layer,allowing it to adaptively adjust its structure to accommodate complex and evolving purchasing behavior patterns.Moreover,this paper optimizes the methods of obtaining feature vectors,enhancement vectors,and prediction results within the deep forest algorithm to enhance the model’s predictive accuracy.Results demonstrate that the improved deep forest model not only possesses higher robustness but also shows an increase of 5.02%in AUC value compared to the baseline model.Furthermore,its training runtime speed is 6 times faster than that of deep models,and compared to other improved models,its accuracy has been enhanced by 0.9%.展开更多
Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medi...Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.展开更多
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
Precise recovery of CoalbedMethane(CBM)based on transparent reconstruction of geological conditions is a branch of intelligent mining.The process of permeability reconstruction,ranging from data perception to real-tim...Precise recovery of CoalbedMethane(CBM)based on transparent reconstruction of geological conditions is a branch of intelligent mining.The process of permeability reconstruction,ranging from data perception to real-time data visualization,is applicable to disaster risk warning and intelligent decision-making on gas drainage.In this study,a machine learning method integrating the Random Forest(RF)and the Genetic Algorithm(GA)was established for permeability prediction in the Xishan Coalfield based on Uniaxial Compressive Strength(UCS),effective stress,temperature and gas pressure.A total of 50 sets of data collected by a self-developed apparatus were used to generate datasets for training and validating models.Statistical measures including the coefficient of determination(R2)and Root Mean Square Error(RMSE)were selected to validate and compare the predictive performances of the single RF model and the hybrid RF–GA model.Furthermore,sensitivity studies were conducted to evaluate the importance of input parameters.The results show that,the proposed RF–GA model is robust in predicting the permeability;UCS is directly correlated to permeability,while all other inputs are inversely related to permeability;the effective stress exerts the greatest impact on permeability based on importance score,followed by the temperature(or gas pressure)and UCS.The partial dependence plots,indicative of marginal utility of each feature in permeability prediction,are in line with experimental results.Thus,the proposed hybrid model(RF–GA)is capable of predicting permeability and thus beneficial to precise CBMrecovery.展开更多
Anomaly classification based on network traffic features is an important task to monitor and detect network intrusion attacks.Network-based intrusion detection systems(NIDSs)using machine learning(ML)methods are effec...Anomaly classification based on network traffic features is an important task to monitor and detect network intrusion attacks.Network-based intrusion detection systems(NIDSs)using machine learning(ML)methods are effective tools for protecting network infrastructures and services from unpredictable and unseen attacks.Among several ML methods,random forest(RF)is a robust method that can be used in ML-based network intrusion detection solutions.However,the minimum number of instances for each split and the number of trees in the forest are two key parameters of RF that can affect classification accuracy.Therefore,optimal parameter selection is a real problem in RF-based anomaly classification of intrusion detection systems.In this paper,we propose to use the genetic algorithm(GA)for selecting the appropriate values of these two parameters,optimizing the RF classifier and improving the classification accuracy of normal and abnormal network traffics.To validate the proposed GA-based RF model,a number of experiments is conducted on two public datasets and evaluated using a set of performance evaluation measures.In these experiments,the accuracy result is compared with the accuracies of baseline ML classifiers in the recent works.Experimental results reveal that the proposed model can avert the uncertainty in selection the values of RF’s parameters,improving the accuracy of anomaly classification in NIDSs without incurring excessive time.展开更多
Aiming at the poor location accuracy caused by the harsh and complex underground environment,long strip roadway,limited wireless transmission and sparse anchor nodes,an underground location algorithm based on random f...Aiming at the poor location accuracy caused by the harsh and complex underground environment,long strip roadway,limited wireless transmission and sparse anchor nodes,an underground location algorithm based on random forest and compensation for environmental factors was proposed.Firstly,the underground wireless access point(AP)network model and tunnel environment were analyzed,and the fingerprint location algorithm was built.And then the Received Signal Strength(RSS)was analyzed by Kalman Filter algorithm in the offline sampling and real-time positioning stage.Meanwhile,the target speed constraint condition was introduced to reduce the error caused by environmental factors.The experimental results show that the proposed algorithm solves the problem of insufficient location accuracy and large fluctuation affected by environment when the anchor nodes are sparse.At the same time,the average location accuracy reaches three meters,which can satisfy the application of underground rescue,activity track playback,disaster monitoring and positioning.It has high application value in complex underground environment.展开更多
Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth diff...Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
Forest harvesting adjustment is a decision-making,large and complex system. In this paper,we analysis the shortcomings of the traditional harvest adjustment problems,and establish the model of multi-target harvest adj...Forest harvesting adjustment is a decision-making,large and complex system. In this paper,we analysis the shortcomings of the traditional harvest adjustment problems,and establish the model of multi-target harvest adjustment. As intelligent optimization,chaotic genetic algorithm has the parallel mechanism and the inherent global optimization characteristics which are suitable for multi-objective planning the settlement of the issue,specially in complex occasions where there are many objective functions and optimize variables. In order to solve the problem of forest harvesting adjustment,this paper introduces a genetic algorithm to the Forest Farm of Qiujia Liancheng Longyan for forest harvesting adjustment firstly. And the experimental result shows that the method is feasible and effective,and it can provide satisfactory solution for policy makers.展开更多
Given a simple graph G with n vertices, m edges and k connected components. The spanning forest problem is to find a spanning tree for each connected component of G. This problem has applications to the electrical pow...Given a simple graph G with n vertices, m edges and k connected components. The spanning forest problem is to find a spanning tree for each connected component of G. This problem has applications to the electrical power demand problem, computer network design, circuit analysis, etc. In this paper, we present an?time parallel algorithm with processors for constructing a spanning forest on proper circle graph G on EREW PRAM.展开更多
The Very Fast Decision Tree(VFDT)algorithm is a classification algorithm for data streams.When processing large amounts of data,VFDT requires less time than traditional decision tree algorithms.However,when training s...The Very Fast Decision Tree(VFDT)algorithm is a classification algorithm for data streams.When processing large amounts of data,VFDT requires less time than traditional decision tree algorithms.However,when training samples become fewer,the label values of VFDT leaf nodes will have more errors,and the classification ability of single VFDT decision tree is limited.The Random Forest algorithm is a combinational classifier with high prediction accuracy and noise-tol-erant ability.It is constituted by multiple decision trees and can make up for the shortage of single decision tree.In this paper,in order to improve the classification accuracy on data streams,the Random Forest algorithm is integrated into the process of tree building of the VFDT algorithm,and a new Random Forest Based Very Fast Decision Tree algorithm named RFVFDT is designed.The RFVFDT algorithm adopts the decision tree building criterion of a Random Forest classifier,and improves Random Forest algorithm with sliding window to meet the unboundedness of data streams and avoid process delay and data loss.Experimental results of the classification of KDD CUP data sets show that the classification accuracy of RFVFDT algorithm is higher than that of VFDT.The less the samples are,the more obvious the advantage is.RFVFDT is fast when running in the multithread mode.展开更多
Airborne laser scanning(ALS)and terrestrial laser scanning(TLS)has attracted attention due to their forest parameter investigation and research applications.ALS is limited to obtaining fi ne structure information belo...Airborne laser scanning(ALS)and terrestrial laser scanning(TLS)has attracted attention due to their forest parameter investigation and research applications.ALS is limited to obtaining fi ne structure information below the forest canopy due to the occlusion of trees in natural forests.In contrast,TLS is unable to gather fi ne structure information about the upper canopy.To address the problem of incomplete acquisition of natural forest point cloud data by ALS and TLS on a single platform,this study proposes data registration without control points.The ALS and TLS original data were cropped according to sample plot size,and the ALS point cloud data was converted into relative coordinates with the center of the cropped data as the origin.The same feature point pairs of the ALS and TLS point cloud data were then selected to register the point cloud data.The initial registered point cloud data was fi nely and optimally registered via the iterative closest point(ICP)algorithm.The results show that the proposed method achieved highprecision registration of ALS and TLS point cloud data from two natural forest plots of Pinus yunnanensis Franch.and Picea asperata Mast.which included diff erent species and environments.An average registration accuracy of 0.06 m and 0.09 m were obtained for P.yunnanensis and P.asperata,respectively.展开更多
Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support Vector Machine algorithm to predictβ-hairpin motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 ...Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support Vector Machine algorithm to predictβ-hairpin motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 amino acid residues are extracted as research object and thefixed-length pattern of 12 amino acids are selected. When using the same characteristic parameters and the same test method, Random Forest algorithm is more effective than Support Vector Machine. In addition, because of Random Forest algorithm doesn’t produce overfitting phenomenon while the dimension of characteristic parameters is higher, we use Random Forest based on higher dimension characteristic parameters to predictβ-hairpin motifs. The better prediction results are obtained;the overall accuracy and Matthew’s correlation coefficient of 5-fold cross-validation achieve 83.3% and 0.59, respectively.展开更多
Production optimization is of significance for carbonate reservoirs,directly affecting the sustainability and profitability of reservoir development.Traditional physics-based numerical simulations suffer from insuffic...Production optimization is of significance for carbonate reservoirs,directly affecting the sustainability and profitability of reservoir development.Traditional physics-based numerical simulations suffer from insufficient calculation accuracy and excessive time consumption when performing production optimization.We establish an ensemble proxy-model-assisted optimization framework combining the Bayesian random forest(BRF)with the particle swarm optimization algorithm(PSO).The BRF method is implemented to construct a proxy model of the injectioneproduction system that can accurately predict the dynamic parameters of producers based on injection data and production measures.With the help of proxy model,PSO is applied to search the optimal injection pattern integrating Pareto front analysis.After experimental testing,the proxy model not only boasts higher prediction accuracy compared to deep learning,but it also requires 8 times less time for training.In addition,the injection mode adjusted by the PSO algorithm can effectively reduce the gaseoil ratio and increase the oil production by more than 10% for carbonate reservoirs.The proposed proxy-model-assisted optimization protocol brings new perspectives on the multi-objective optimization problems in the petroleum industry,which can provide more options for the project decision-makers to balance the oil production and the gaseoil ratio considering physical and operational constraints.展开更多
Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection ...Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.展开更多
基金financially supported by the National Natural Science Foundation of China(No.52174001)the National Natural Science Foundation of China(No.52004064)+1 种基金the Hainan Province Science and Technology Special Fund “Research on Real-time Intelligent Sensing Technology for Closed-loop Drilling of Oil and Gas Reservoirs in Deepwater Drilling”(ZDYF2023GXJS012)Heilongjiang Provincial Government and Daqing Oilfield's first batch of the scientific and technological key project “Research on the Construction Technology of Gulong Shale Oil Big Data Analysis System”(DQYT-2022-JS-750)。
文摘Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.
基金Under the auspices of National Natural Science Foundation of China(No.52079103)。
文摘Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.
基金Supported by projects of National Natural Science Foundation of China(No.42074150)National Key Research and Development Program of China(No.2023YFC3707901)Futian District Integrated Ground Collapse Monitoring and Early Warning System Construction Project(No.FTCG2023000209).
文摘The exploration of urban underground spaces is of great significance to urban planning,geological disaster prevention,resource exploration and environmental monitoring.However,due to the existing of severe interferences,conventional seismic methods cannot adapt to the complex urban environment well.Since adopting the single-node data acquisition method and taking the seismic ambient noise as the signal,the microtremor horizontal-to-vertical spectral ratio(HVSR)method can effectively avoid the strong interference problems caused by the complex urban environment,which could obtain information such as S-wave velocity and thickness of underground formations by fitting the microtremor HVSR curve.Nevertheless,HVSR curve inversion is a multi-parameter curve fitting process.And conventional inversion methods can easily converge to the local minimum,which will directly affect the reliability of the inversion results.Thus,the authors propose a HVSR inversion method based on the multimodal forest optimization algorithm,which uses the efficient clustering technique and locates the global optimum quickly.Tests on synthetic data show that the inversion results of the proposed method are consistent with the forward model.Both the adaption and stability to the abnormal layer velocity model are demonstrated.The results of the real field data are also verified by the drilling information.
基金Supported by Basic and Applied Basic Research Project of Guangdong Province(2021B0301030006)。
文摘The random forest algorithm was applied to study the nuclear binding energy and charge radius.The regularized root-mean-square of error(RMSE)was proposed to avoid overfitting during the training of random forest.RMSE for nuclides with Z,N>7 is reduced to 0.816 MeV and 0.0200 fm compared with the six-term liquid drop model and a three-term nuclear charge radius formula,respectively.Specific interest is in the possible(sub)shells among the superheavy region,which is important for searching for new elements and the island of stability.The significance of shell features estimated by the so-called shapely additive explanation method suggests(Z,N)=(92,142)and(98,156)as possible subshells indicated by the binding energy.Because the present observed data is far from the N=184 shell,which is suggested by mean-field investigations,its shell effect is not predicted based on present training.The significance analysis of the nuclear charge radius suggests Z=92 and N=136 as possible subshells.The effect is verified by the shell-corrected nuclear charge radius model.
基金supported by Ningxia Key R&D Program (Key)Project (2023BDE02001)Ningxia Key R&D Program (Talent Introduction Special)Project (2022YCZX0013)+2 种基金North Minzu University 2022 School-Level Research Platform“Digital Agriculture Empowering Ningxia Rural Revitalization Innovation Team”,Project Number:2022PT_S10Yinchuan City School-Enterprise Joint Innovation Project (2022XQZD009)“Innovation Team for Imaging and Intelligent Information Processing”of the National Ethnic Affairs Commission.
文摘Widely used deep neural networks currently face limitations in achieving optimal performance for purchase intention prediction due to constraints on data volume and hyperparameter selection.To address this issue,based on the deep forest algorithm and further integrating evolutionary ensemble learning methods,this paper proposes a novel Deep Adaptive Evolutionary Ensemble(DAEE)model.This model introduces model diversity into the cascade layer,allowing it to adaptively adjust its structure to accommodate complex and evolving purchasing behavior patterns.Moreover,this paper optimizes the methods of obtaining feature vectors,enhancement vectors,and prediction results within the deep forest algorithm to enhance the model’s predictive accuracy.Results demonstrate that the improved deep forest model not only possesses higher robustness but also shows an increase of 5.02%in AUC value compared to the baseline model.Furthermore,its training runtime speed is 6 times faster than that of deep models,and compared to other improved models,its accuracy has been enhanced by 0.9%.
文摘Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.
基金This work has been supported by the Fundamental Research Funds for the Central Universities[2017XKZD06].
文摘Precise recovery of CoalbedMethane(CBM)based on transparent reconstruction of geological conditions is a branch of intelligent mining.The process of permeability reconstruction,ranging from data perception to real-time data visualization,is applicable to disaster risk warning and intelligent decision-making on gas drainage.In this study,a machine learning method integrating the Random Forest(RF)and the Genetic Algorithm(GA)was established for permeability prediction in the Xishan Coalfield based on Uniaxial Compressive Strength(UCS),effective stress,temperature and gas pressure.A total of 50 sets of data collected by a self-developed apparatus were used to generate datasets for training and validating models.Statistical measures including the coefficient of determination(R2)and Root Mean Square Error(RMSE)were selected to validate and compare the predictive performances of the single RF model and the hybrid RF–GA model.Furthermore,sensitivity studies were conducted to evaluate the importance of input parameters.The results show that,the proposed RF–GA model is robust in predicting the permeability;UCS is directly correlated to permeability,while all other inputs are inversely related to permeability;the effective stress exerts the greatest impact on permeability based on importance score,followed by the temperature(or gas pressure)and UCS.The partial dependence plots,indicative of marginal utility of each feature in permeability prediction,are in line with experimental results.Thus,the proposed hybrid model(RF–GA)is capable of predicting permeability and thus beneficial to precise CBMrecovery.
文摘Anomaly classification based on network traffic features is an important task to monitor and detect network intrusion attacks.Network-based intrusion detection systems(NIDSs)using machine learning(ML)methods are effective tools for protecting network infrastructures and services from unpredictable and unseen attacks.Among several ML methods,random forest(RF)is a robust method that can be used in ML-based network intrusion detection solutions.However,the minimum number of instances for each split and the number of trees in the forest are two key parameters of RF that can affect classification accuracy.Therefore,optimal parameter selection is a real problem in RF-based anomaly classification of intrusion detection systems.In this paper,we propose to use the genetic algorithm(GA)for selecting the appropriate values of these two parameters,optimizing the RF classifier and improving the classification accuracy of normal and abnormal network traffics.To validate the proposed GA-based RF model,a number of experiments is conducted on two public datasets and evaluated using a set of performance evaluation measures.In these experiments,the accuracy result is compared with the accuracies of baseline ML classifiers in the recent works.Experimental results reveal that the proposed model can avert the uncertainty in selection the values of RF’s parameters,improving the accuracy of anomaly classification in NIDSs without incurring excessive time.
基金The work was supported by Projects of Natural Science Foundational in Higher Education Institutions of Anhui Province(KJ2017A449)Chaohu University’s Innovation and Entrepreneurship Training Program for Provincial College Students in 2019(No.S201910380042)。
文摘Aiming at the poor location accuracy caused by the harsh and complex underground environment,long strip roadway,limited wireless transmission and sparse anchor nodes,an underground location algorithm based on random forest and compensation for environmental factors was proposed.Firstly,the underground wireless access point(AP)network model and tunnel environment were analyzed,and the fingerprint location algorithm was built.And then the Received Signal Strength(RSS)was analyzed by Kalman Filter algorithm in the offline sampling and real-time positioning stage.Meanwhile,the target speed constraint condition was introduced to reduce the error caused by environmental factors.The experimental results show that the proposed algorithm solves the problem of insufficient location accuracy and large fluctuation affected by environment when the anchor nodes are sparse.At the same time,the average location accuracy reaches three meters,which can satisfy the application of underground rescue,activity track playback,disaster monitoring and positioning.It has high application value in complex underground environment.
基金supported by the Major Program of the National Natural Science Foundation of China(No.32192434)the Fundamental Research Funds of Chinese Academy of Forestry(No.CAFYBB2019ZD001)the National Key Research and Development Program of China(2016YFD060020602).
文摘Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘Forest harvesting adjustment is a decision-making,large and complex system. In this paper,we analysis the shortcomings of the traditional harvest adjustment problems,and establish the model of multi-target harvest adjustment. As intelligent optimization,chaotic genetic algorithm has the parallel mechanism and the inherent global optimization characteristics which are suitable for multi-objective planning the settlement of the issue,specially in complex occasions where there are many objective functions and optimize variables. In order to solve the problem of forest harvesting adjustment,this paper introduces a genetic algorithm to the Forest Farm of Qiujia Liancheng Longyan for forest harvesting adjustment firstly. And the experimental result shows that the method is feasible and effective,and it can provide satisfactory solution for policy makers.
文摘Given a simple graph G with n vertices, m edges and k connected components. The spanning forest problem is to find a spanning tree for each connected component of G. This problem has applications to the electrical power demand problem, computer network design, circuit analysis, etc. In this paper, we present an?time parallel algorithm with processors for constructing a spanning forest on proper circle graph G on EREW PRAM.
文摘The Very Fast Decision Tree(VFDT)algorithm is a classification algorithm for data streams.When processing large amounts of data,VFDT requires less time than traditional decision tree algorithms.However,when training samples become fewer,the label values of VFDT leaf nodes will have more errors,and the classification ability of single VFDT decision tree is limited.The Random Forest algorithm is a combinational classifier with high prediction accuracy and noise-tol-erant ability.It is constituted by multiple decision trees and can make up for the shortage of single decision tree.In this paper,in order to improve the classification accuracy on data streams,the Random Forest algorithm is integrated into the process of tree building of the VFDT algorithm,and a new Random Forest Based Very Fast Decision Tree algorithm named RFVFDT is designed.The RFVFDT algorithm adopts the decision tree building criterion of a Random Forest classifier,and improves Random Forest algorithm with sliding window to meet the unboundedness of data streams and avoid process delay and data loss.Experimental results of the classification of KDD CUP data sets show that the classification accuracy of RFVFDT algorithm is higher than that of VFDT.The less the samples are,the more obvious the advantage is.RFVFDT is fast when running in the multithread mode.
基金supported by the National Natural Science Foundation of China,Grant Number 41961060by the Program for Innovative Research Team (in Science and Technology) in the University of Yunnan Province,Grant Number IRTSTYN+1 种基金by the Scientific Research Fund Project of the Education Department of Yunnan Province,Grant Numbers 2020J0256 and 2021J0438by the Postgraduate Scientific Research and Innovation Fund Project of Yunnan Normal University,Grant Number YJSJJ21-A08
文摘Airborne laser scanning(ALS)and terrestrial laser scanning(TLS)has attracted attention due to their forest parameter investigation and research applications.ALS is limited to obtaining fi ne structure information below the forest canopy due to the occlusion of trees in natural forests.In contrast,TLS is unable to gather fi ne structure information about the upper canopy.To address the problem of incomplete acquisition of natural forest point cloud data by ALS and TLS on a single platform,this study proposes data registration without control points.The ALS and TLS original data were cropped according to sample plot size,and the ALS point cloud data was converted into relative coordinates with the center of the cropped data as the origin.The same feature point pairs of the ALS and TLS point cloud data were then selected to register the point cloud data.The initial registered point cloud data was fi nely and optimally registered via the iterative closest point(ICP)algorithm.The results show that the proposed method achieved highprecision registration of ALS and TLS point cloud data from two natural forest plots of Pinus yunnanensis Franch.and Picea asperata Mast.which included diff erent species and environments.An average registration accuracy of 0.06 m and 0.09 m were obtained for P.yunnanensis and P.asperata,respectively.
文摘Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support Vector Machine algorithm to predictβ-hairpin motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 amino acid residues are extracted as research object and thefixed-length pattern of 12 amino acids are selected. When using the same characteristic parameters and the same test method, Random Forest algorithm is more effective than Support Vector Machine. In addition, because of Random Forest algorithm doesn’t produce overfitting phenomenon while the dimension of characteristic parameters is higher, we use Random Forest based on higher dimension characteristic parameters to predictβ-hairpin motifs. The better prediction results are obtained;the overall accuracy and Matthew’s correlation coefficient of 5-fold cross-validation achieve 83.3% and 0.59, respectively.
基金the financial support of this work from the National Natural Science Foundation of China(Grant No.11972073,Grant No.51974357,and Grant No.52274027)supported by China Postdoctoral Science Foundation(Grant No.2022M713204)Scientific Research and Technology Development Project of China National Petroleum Corporation(Grant No.2121DJ2301).
文摘Production optimization is of significance for carbonate reservoirs,directly affecting the sustainability and profitability of reservoir development.Traditional physics-based numerical simulations suffer from insufficient calculation accuracy and excessive time consumption when performing production optimization.We establish an ensemble proxy-model-assisted optimization framework combining the Bayesian random forest(BRF)with the particle swarm optimization algorithm(PSO).The BRF method is implemented to construct a proxy model of the injectioneproduction system that can accurately predict the dynamic parameters of producers based on injection data and production measures.With the help of proxy model,PSO is applied to search the optimal injection pattern integrating Pareto front analysis.After experimental testing,the proxy model not only boasts higher prediction accuracy compared to deep learning,but it also requires 8 times less time for training.In addition,the injection mode adjusted by the PSO algorithm can effectively reduce the gaseoil ratio and increase the oil production by more than 10% for carbonate reservoirs.The proposed proxy-model-assisted optimization protocol brings new perspectives on the multi-objective optimization problems in the petroleum industry,which can provide more options for the project decision-makers to balance the oil production and the gaseoil ratio considering physical and operational constraints.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number(IF2-PSAU-2022/01/22043)。
文摘Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.