During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and...During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used t...In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.展开更多
Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in term...Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in terms of simple calculation and high computational accuracy,but its convergence speed is slow and it is easy to fall into the local optimal solution.In order to overcome the shortcomings,this paper integrates adaptive neighborhood and hybrid mutation strategies into whale optimization algorithms,designs the average distance from itself to other whales as an adaptive neighborhood radius,and chooses to learn from the optimal solution in the neighborhood instead of random learning strategies.The hybrid mutation strategy is used to enhance the ability of algorithm to jump out of the local optimal solution.A new whale optimization algorithm(HMNWOA)is proposed.The proposed algorithm inherits the global search capability of the original algorithm,enhances the exploitation ability,improves the quality of the population,and thus improves the convergence speed of the algorithm.A feature selection algorithm based on binary HMNWOA is proposed.Twelve standard datasets from UCI repository test the validity of the proposed algorithm for feature selection.The experimental results show that HMNWOA is very competitive compared to the other six popular feature selection methods in improving the classification accuracy and reducing the number of features,and ensures that HMNWOA has strong search ability in the search feature space.展开更多
The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effectiv...The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.展开更多
Most of the machineries in small or large-scale industry have rotating elementsupported by bearings for rigid support and accurate movement. For proper functioning ofmachinery, condition monitoring of the bearing is v...Most of the machineries in small or large-scale industry have rotating elementsupported by bearings for rigid support and accurate movement. For proper functioning ofmachinery, condition monitoring of the bearing is very important. In present study soundsignal is used to continuously monitor bearing health as sound signals of rotatingmachineries carry dynamic information of components. There are numerous studies inliterature that are reporting superiority of vibration signal of bearing fault diagnosis.However, there are very few studies done using sound signal. The cost associated withcondition monitoring using sound signal (Microphone) is less than the cost of transducerused to acquire vibration signal (Accelerometer). This paper employs sound signal forcondition monitoring of roller bearing by K-star classifier and k-nearest neighborhoodclassifier. The statistical feature extraction is performed from acquired sound signals. Thentwo-layer feature selection is done using J48 decision tree algorithm and random treealgorithm. These selected features were classified using K-star classifier and k-nearestneighborhood classifier and parametric optimization is performed to achieve the maximumclassification accuracy. The classification results for both K-star classifier and k-nearestneighborhood classifier for condition monitoring of roller bearing using sound signals werecompared.展开更多
Slurry electrolysis(SE),as a hydrometallurgical process,has the characteristic of a multitank series connection,which leads to various stirring conditions and a complex solid suspension state.The computational fluid d...Slurry electrolysis(SE),as a hydrometallurgical process,has the characteristic of a multitank series connection,which leads to various stirring conditions and a complex solid suspension state.The computational fluid dynamics(CFD),which requires high computing resources,and a combination with machine learning was proposed to construct a rapid prediction model for the liquid flow and solid concentration fields in a SE tank.Through scientific selection of calculation samples via orthogonal experiments,a comprehensive dataset covering a wide range of conditions was established while effectively reducing the number of simulations and providing reasonable weights for each factor.Then,a prediction model of the SE tank was constructed using the K-nearest neighbor algorithm.The results show that with the increase in levels of orthogonal experiments,the prediction accuracy of the model improved remarkably.The model established with four factors and nine levels can accurately predict the flow and concentration fields,and the regression coefficients of average velocity and solid concentration were 0.926 and 0.937,respectively.Compared with traditional CFD,the response time of field information prediction in this model was reduced from 75 h to 20 s,which solves the problem of serious lag in CFD applied alone to actual production and meets real-time production control requirements.展开更多
The complexity and unpredictability of clear air turbulence(CAT)pose significant challenges to aviation safety.Accurate prediction of turbulence events is crucial for reducing flight accidents and economic losses.Howe...The complexity and unpredictability of clear air turbulence(CAT)pose significant challenges to aviation safety.Accurate prediction of turbulence events is crucial for reducing flight accidents and economic losses.However,traditional turbulence prediction methods,such as ensemble forecasting techniques,have certain limitations:they only consider turbulence data from the most recent period,making it difficult to capture the nonlinear relationships present in turbulence.This study proposes a turbulence forecasting model based on the K-nearest neighbor(KNN)algorithm,which uses a combination of eight CAT diagnostic features as the feature vector and introduces CAT diagnostic feature weights to improve prediction accuracy.The model calculates the results of seven years of CAT diagnostics from 125 to 500 hPa obtained from the ECMWF fifth-generation reanalysis dataset(ERA5)as feature vector inputs and combines them with the labels of Pilot Reports(PIREP)annotated data,where each sample contributes to the prediction result.By measuring the distance between the current CAT diagnostic variable and other variables,the model determines the climatically most similar neighbors and identifies the turbulence intensity category caused by the current variable.To evaluate the model’s performance in diagnosing high-altitude turbulence over Colorado,PIREP cases were randomly selected for analysis.The results show that the weighted KNN(W-KNN)model exhibits higher skill in turbulence prediction,and outperforms traditional prediction methods and other machine learning models(e.g.,Random Forest)in capturing moderate or greater(MOG)level turbulence.The performance of the model was confirmed by evaluating the receiver operating characteristic(ROC)curve,maximum True Skill Statistic(maxTSS=0.552),and reliability plot.A robust score(area under the curve:AUC=0.86)was obtained,and the model demonstrated sensitivity to seasonal and annual climate fluctuations.展开更多
基金supported by the Innovative Research Groups of National Natural Science Foundation of China(No. 51621092)National Basic Research Program of China ("973" Program, No. 2013CB035904)National Natural Science Foundation of China (No. 51439005)
文摘During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金supported by the National Science Fund for Distinguished Young Scholars of China(61525304)the National Natural Science Foundation of China(61873328)
文摘In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.
基金This work was supported by the National Natural Science Foundation of China(Grant No.2017YFC0403605 and No.11601419).
文摘Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in terms of simple calculation and high computational accuracy,but its convergence speed is slow and it is easy to fall into the local optimal solution.In order to overcome the shortcomings,this paper integrates adaptive neighborhood and hybrid mutation strategies into whale optimization algorithms,designs the average distance from itself to other whales as an adaptive neighborhood radius,and chooses to learn from the optimal solution in the neighborhood instead of random learning strategies.The hybrid mutation strategy is used to enhance the ability of algorithm to jump out of the local optimal solution.A new whale optimization algorithm(HMNWOA)is proposed.The proposed algorithm inherits the global search capability of the original algorithm,enhances the exploitation ability,improves the quality of the population,and thus improves the convergence speed of the algorithm.A feature selection algorithm based on binary HMNWOA is proposed.Twelve standard datasets from UCI repository test the validity of the proposed algorithm for feature selection.The experimental results show that HMNWOA is very competitive compared to the other six popular feature selection methods in improving the classification accuracy and reducing the number of features,and ensures that HMNWOA has strong search ability in the search feature space.
文摘The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.
文摘Most of the machineries in small or large-scale industry have rotating elementsupported by bearings for rigid support and accurate movement. For proper functioning ofmachinery, condition monitoring of the bearing is very important. In present study soundsignal is used to continuously monitor bearing health as sound signals of rotatingmachineries carry dynamic information of components. There are numerous studies inliterature that are reporting superiority of vibration signal of bearing fault diagnosis.However, there are very few studies done using sound signal. The cost associated withcondition monitoring using sound signal (Microphone) is less than the cost of transducerused to acquire vibration signal (Accelerometer). This paper employs sound signal forcondition monitoring of roller bearing by K-star classifier and k-nearest neighborhoodclassifier. The statistical feature extraction is performed from acquired sound signals. Thentwo-layer feature selection is done using J48 decision tree algorithm and random treealgorithm. These selected features were classified using K-star classifier and k-nearestneighborhood classifier and parametric optimization is performed to achieve the maximumclassification accuracy. The classification results for both K-star classifier and k-nearestneighborhood classifier for condition monitoring of roller bearing using sound signals werecompared.
基金financially supported by the National Natural Science Foundation of China(No.51974018the Open Foundation of the State Key Laboratory of Process Automation in Mining and Metallurgy(No.BGRIMM-KZSKL-2022-9).
文摘Slurry electrolysis(SE),as a hydrometallurgical process,has the characteristic of a multitank series connection,which leads to various stirring conditions and a complex solid suspension state.The computational fluid dynamics(CFD),which requires high computing resources,and a combination with machine learning was proposed to construct a rapid prediction model for the liquid flow and solid concentration fields in a SE tank.Through scientific selection of calculation samples via orthogonal experiments,a comprehensive dataset covering a wide range of conditions was established while effectively reducing the number of simulations and providing reasonable weights for each factor.Then,a prediction model of the SE tank was constructed using the K-nearest neighbor algorithm.The results show that with the increase in levels of orthogonal experiments,the prediction accuracy of the model improved remarkably.The model established with four factors and nine levels can accurately predict the flow and concentration fields,and the regression coefficients of average velocity and solid concentration were 0.926 and 0.937,respectively.Compared with traditional CFD,the response time of field information prediction in this model was reduced from 75 h to 20 s,which solves the problem of serious lag in CFD applied alone to actual production and meets real-time production control requirements.
基金Supported by the Nanjing University of Aeronautics and Astronautics(KFB2305601).
文摘The complexity and unpredictability of clear air turbulence(CAT)pose significant challenges to aviation safety.Accurate prediction of turbulence events is crucial for reducing flight accidents and economic losses.However,traditional turbulence prediction methods,such as ensemble forecasting techniques,have certain limitations:they only consider turbulence data from the most recent period,making it difficult to capture the nonlinear relationships present in turbulence.This study proposes a turbulence forecasting model based on the K-nearest neighbor(KNN)algorithm,which uses a combination of eight CAT diagnostic features as the feature vector and introduces CAT diagnostic feature weights to improve prediction accuracy.The model calculates the results of seven years of CAT diagnostics from 125 to 500 hPa obtained from the ECMWF fifth-generation reanalysis dataset(ERA5)as feature vector inputs and combines them with the labels of Pilot Reports(PIREP)annotated data,where each sample contributes to the prediction result.By measuring the distance between the current CAT diagnostic variable and other variables,the model determines the climatically most similar neighbors and identifies the turbulence intensity category caused by the current variable.To evaluate the model’s performance in diagnosing high-altitude turbulence over Colorado,PIREP cases were randomly selected for analysis.The results show that the weighted KNN(W-KNN)model exhibits higher skill in turbulence prediction,and outperforms traditional prediction methods and other machine learning models(e.g.,Random Forest)in capturing moderate or greater(MOG)level turbulence.The performance of the model was confirmed by evaluating the receiver operating characteristic(ROC)curve,maximum True Skill Statistic(maxTSS=0.552),and reliability plot.A robust score(area under the curve:AUC=0.86)was obtained,and the model demonstrated sensitivity to seasonal and annual climate fluctuations.