Photovoltaic(PV) power generation is characterized by randomness and intermittency due to weather changes.Consequently, large-scale PV power connections to the grid can threaten the stable operation of the power syste...Photovoltaic(PV) power generation is characterized by randomness and intermittency due to weather changes.Consequently, large-scale PV power connections to the grid can threaten the stable operation of the power system. An effective method to resolve this problem is to accurately predict PV power. In this study, an innovative short-term hybrid prediction model(i.e., HKSL) of PV power is established. The model combines K-means++, optimal similar day approach,and long short-term memory(LSTM) network. Historical power data and meteorological factors are utilized. This model searches for the best similar day based on the results of classifying weather types. Then, the data of similar day are inputted into the LSTM network to predict PV power. The validity of the hybrid model is verified based on the datasets from a PV power station in Shandong Province, China. Four evaluation indices, mean absolute error, root mean square error(RMSE),normalized RMSE, and mean absolute deviation, are employed to assess the performance of the HKSL model. The RMSE of the proposed model compared with those of Elman, LSTM, HSE(hybrid model combining similar day approach and Elman), HSL(hybrid model combining similar day approach and LSTM), and HKSE(hybrid model combining K-means++,similar day approach, and LSTM) decreases by 66.73%, 70.22%, 65.59%, 70.51%, and 18.40%, respectively. This proves the reliability and excellent performance of the proposed hybrid model in predicting power.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease ...Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.展开更多
The dimensionality of data is increasing very rapidly,which creates challenges for most of the current mining and learning algorithms,such as large memory requirements and high computational costs.The literature inclu...The dimensionality of data is increasing very rapidly,which creates challenges for most of the current mining and learning algorithms,such as large memory requirements and high computational costs.The literature includes much research on feature selection for supervised learning.However,feature selection for unsupervised learning has only recently been studied.Finding the subset of features in unsupervised learning that enhances the performance is challenging since the clusters are indeterminate.This work proposes a hybrid technique for unsupervised feature selection called GAk-MEANS,which combines the genetic algorithm(GA)approach with the classical k-Means algorithm.In the proposed algorithm,a new fitness func-tion is designed in addition to new smart crossover and mutation operators.The effectiveness of this algorithm is demonstrated on various datasets.Fur-thermore,the performance of GAk-MEANS has been compared with other genetic algorithms,such as the genetic algorithm using the Sammon Error Function and the genetic algorithm using the Sum of Squared Error Function.Additionally,the performance of GAk-MEANS is compared with the state-of-the-art statistical unsupervised feature selection techniques.Experimental results show that GAk-MEANS consistently selects subsets of features that result in better classification accuracy compared to others.In particular,GAk-MEANS is able to significantly reduce the size of the subset of selected features by an average of 86.35%(72%–96.14%),which leads to an increase of the accuracy by an average of 3.78%(1.05%–6.32%)compared to using all features.When compared with the genetic algorithm using the Sammon Error Function,GAk-MEANS is able to reduce the size of the subset of selected features by 41.29%on average,improve the accuracy by 5.37%,and reduce the time by 70.71%.When compared with the genetic algorithm using the Sum of Squared Error Function,GAk-MEANS on average is able to reduce the size of the subset of selected features by 15.91%,and improve the accuracy by 9.81%,but the time is increased by a factor of 3.When compared with the machine-learning based methods,we observed that GAk-MEANS is able to increase the accuracy by 13.67%on average with an 88.76%average increase in time.展开更多
Retailing is a dynamic business domain where commodities and goods are sold in small quantities directly to the customers.It deals with the end user customers of a supply-chain network and therefore has to accommodate...Retailing is a dynamic business domain where commodities and goods are sold in small quantities directly to the customers.It deals with the end user customers of a supply-chain network and therefore has to accommodate the needs and desires of a large group of customers over varied utilities.The volume and volatility of the business makes it one of the prospectivefields for analytical study and data modeling.This is also why customer segmentation drives a key role in multiple retail business decisions such as marketing budgeting,customer targeting,customized offers,value proposition etc.The segmentation could be on various aspects such as demographics,historic behavior or preferences based on the use cases.In this paper,historic retail transactional data is used to segment the custo-mers using K-Means clustering and the results are utilized to arrive at a transition matrix which is used to predict the cluster movements over the time period using Markov Model algorithm.This helps in calculating the futuristic value a segment or a customer brings to the business.Strategic marketing designs and budgeting can be implemented using these results.The study is specifically useful for large scale marketing in domains such as e-commerce,insurance or retailers to segment,profile and measure the customer lifecycle value over a short period of time.展开更多
A high-precision nominal flight profile,involving controllers′intentions is critical for 4Dtrajectory estimation in modern automatic air traffic control systems.We proposed a novel method to effectively improve the a...A high-precision nominal flight profile,involving controllers′intentions is critical for 4Dtrajectory estimation in modern automatic air traffic control systems.We proposed a novel method to effectively improve the accuracy of the nominal flight profile,including the nominal altitude profile and the speed profile.First,considering the characteristics of trajectory data,we developed an improved K-means algorithm.The approach was to measure the similarity between different altitude profiles by integrating the space warp edit distance algorithm,thereby to acquire several fitted nominal flight altitude profiles.This approach breaks the constraints of traditional K-means algorithms.Second,to eliminate the influence of meteorological factors,we introduced historical gridded binary data to determine the en-route wind speed and temperature via inverse distance weighted interpolation.Finally,we facilitated the true airspeed determined by speed triangle relationships and the calibrated airspeed determined by aircraft data model to extract a more accurate nominal speed profile from each cluster,therefore we could describe the airspeed profiles above and below the airspeed transition altitude,respectively.Our experimental results showed that the proposed method could obtain a highly accurate nominal flight profile,which reflects the actual aircraft flight status.展开更多
Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of D...Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of DN to reduce the prevalence and delay the development of DN.Kidney biopsy is the gold standard for diagnosing DN;however,its invasive character is its primary limitation.The machine learning approach provides a non-invasive and specific criterion for diagnosing DN,although traditional machine learning algorithms need to be improved to enhance diagnostic performance.Methods:We applied high-throughput RNA sequencing to obtain the genes related to DN tubular tissues and normal tubular tissues of mice.Then machine learning algorithms,random forest,LASSO logistic regression,and principal component analysis were used to identify key genes(CES1G,CYP4A14,NDUFA4,ABCC4,ACE).Then,the genetic algorithm-optimized backpropagation neural network(GA-BPNN)was used to improve the DN diagnostic model.Results:The AUC value of the GA-BPNN model in the training dataset was 0.83,and the AUC value of the model in the validation dataset was 0.81,while the AUC values of the SVM model in the training dataset and external validation dataset were 0.756 and 0.650,respectively.Thus,this GA-BPNN gave better values than the traditional SVM model.This diagnosis model may aim for personalized diagnosis and treatment of patients with DN.Immunohistochemical staining further confirmed that the tissue and cell expression of NADH dehydrogenase(ubiquinone)1 alpha subcomplex,4-like 2(NDUFA4L2)in tubular tissue in DN mice were decreased.Conclusion:The GA-BPNN model has better accuracy than the traditional SVM model and may provide an effective tool for diagnosing DN.展开更多
Fractional vegetation cover(FVC)is an important parameter to measure crop growth.In studies of crop growth monitoring,it is very important to extract FVC quickly and accurately.As the most widely used FVC extraction m...Fractional vegetation cover(FVC)is an important parameter to measure crop growth.In studies of crop growth monitoring,it is very important to extract FVC quickly and accurately.As the most widely used FVC extraction method,the photographic method has the advantages of simple operation and high extraction accuracy.However,when soil moisture and acquisition times vary,the extraction results are less accurate.To accommodate various conditions of FVC extraction,this study proposes a new FVC extraction method that extracts FVC from a normalized difference vegetation index(NDVI)greyscale image of wheat by using a density peak k-means(DPK-means)algorithm.In this study,Yangfumai 4(YF4)planted in pots and Yangmai 16(Y16)planted in the field were used as the research materials.With a hyperspectral imaging camera mounted on a tripod,ground hyperspectral images of winter wheat under different soil conditions(dry and wet)were collected at 1 m above the potted wheat canopy.Unmanned aerial vehicle(UAV)hyperspectral images of winter wheat at various stages were collected at 50 m above the field wheat canopy by a UAV equipped with a hyperspectral camera.The pixel dichotomy method and DPK-means algorithm were used to classify vegetation pixels and non-vegetation pixels in NDVI greyscale images of wheat,and the extraction effects of the two methods were compared and analysed.The results showed that extraction by pixel dichotomy was influenced by the acquisition conditions and its error distribution was relatively scattered,while the extraction effect of the DPK-means algorithm was less affected by the acquisition conditions and its error distribution was concentrated.The absolute values of error were 0.042 and 0.044,the root mean square errors(RMSE)were 0.028 and 0.030,and the fitting accuracy R2 of the FVC was 0.87 and 0.93,under dry and wet soil conditions and under various time conditions,respectively.This study found that the DPK-means algorithm was capable of achieving more accurate results than the pixel dichotomy method in various soil and time conditions and was an accurate and robust method for FVC extraction.展开更多
This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS,MCUVE-DS a...This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS,MCUVE-DS and SiPLS-DS.The Successive Projection Algorithm(SPA),the Monte-Carlo of Uninformative Variable Elimination(MCUVE)and the Synergy Interval Partial Least Squares(SiPLS)algorithms are respectively used to reduce the adverse effects of redundant information in the transmission process of the full spectrum DS algorithm model.These three algorithms can improve model transfer accuracy and efficiency and reduce the manpower and material consumption required for modeling.These results show that the modeling effects of the characteristic wavelengths screened by the SPA,MCUVE and SiPLS algorithms are all greatly improved compared with the full-spectrum modeling,in which the SPA-PLS result in the best prediction with RPDs above 6.5 for both components.The three wavelength selection methods combined with the DS algorithm are used to transfer the models of the two instruments.Among them,the MCUVE combined with the DS algorithm has the best transfer effect.After the model transfer,the RMSEP of lignin is 0.701,and the RMSEP of holocellulose is 0.839,which was improved significantly than the full-spectrum model transfer of 0.759 and 0.918.展开更多
Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization...Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization,which is the most widely used approach.Runoff modeling was studied in 38 catchments located in the Yellow–Huai–Hai River Basin(YHHRB).The values of the Nash–Sutcliffe efficiency coefficient(NSE),coefficient of determination(R2),and percent bias(PBIAS)indicated the acceptable performance of the soil and water assessment tool(SWAT)model in the YHHRB.Nine descriptors belonging to the categories of climate,soil,vegetation,and topography were used to express the catchment characteristics related to the hydrological processes.The quantitative relationships between the parameters of the SWAT model and the catchment descriptors were analyzed by six regression-based models,including linear regression(LR)equations,support vector regression(SVR),random forest(RF),k-nearest neighbor(kNN),decision tree(DT),and radial basis function(RBF).Each of the 38 catchments was assumed to be an ungauged catchment in turn.Then,the parameters in each target catchment were estimated by the constructed regression models based on the remaining 37 donor catchments.Furthermore,the similaritybased regionalization scheme was used for comparison with the regression-based approach.The results indicated that the runoff with the highest accuracy was modeled by the SVR-based scheme in ungauged catchments.Compared with the traditional LR-based approach,the accuracy of the runoff modeling in ungauged catchments was improved by the machine learning algorithms because of the outstanding capability to deal with nonlinear relationships.The performances of different approaches were similar in humid regions,while the advantages of the machine learning techniques were more evident in arid regions.When the study area contained nested catchments,the best result was calculated with the similarity-based parameter regionalization scheme because of the high catchment density and short spatial distance.The new findings could improve flood forecasting and water resources planning in regions that lack observed data.展开更多
In order to solve the defect of large error in current employment quality evaluation,an employment quality evaluation model based on grey correlation degree method and fuzzy C-means(FCM)is proposed.Firstly,it analyzes...In order to solve the defect of large error in current employment quality evaluation,an employment quality evaluation model based on grey correlation degree method and fuzzy C-means(FCM)is proposed.Firstly,it analyzes the related research work of employment quality evaluation,establishes the employment quality evaluation index system,collects the index data,and normalizes the index data;Then,the weight value of employment quality evaluation index is determined by Grey relational analysis method,and some unimportant indexes are removed;Finally,the employment quality evaluation model is established by using fuzzy cluster analysis algorithm,and compared with other employment quality evaluation models.The test results show that the employment quality evaluation accuracy of the design model exceeds 93%,the employment quality evaluation error can meet the requirements of practical application,and the employment quality evaluation effect is much better than the comparison model.The comparison test verifies the superiority of the model.展开更多
Clastic rock reservoir is the main reservoir type in the oil and gas field.Archie formula or various conductive models developed on the basis of Archie’s formula are usually used to interpret this kind of reservoir,a...Clastic rock reservoir is the main reservoir type in the oil and gas field.Archie formula or various conductive models developed on the basis of Archie’s formula are usually used to interpret this kind of reservoir,and the three-water model is widely used as well.However,there are many parameters in the threewater model,and some of them are difficult to determine.Most of the determination methods are based on the statistics of large amount of experimental data.In this study,the authors determine the value of the parameters of the new three-water model based on the nuclear magnetic data and the genetic optimization algorithm.The relative error between the resistivity calculated based on these parameters and the resistivity measured experimentally at 100%water content is 0.9024.The method studied in this paper can be easily applied without much experimental data.It can provide reference for other regions to determine the parameters of the new three-water model.展开更多
This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance rank...This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
基金supported by the No. 4 National Project in 2022 of the Ministry of Emergency Response (2022YJBG04)the International Clean Energy Talent Program (201904100014)。
文摘Photovoltaic(PV) power generation is characterized by randomness and intermittency due to weather changes.Consequently, large-scale PV power connections to the grid can threaten the stable operation of the power system. An effective method to resolve this problem is to accurately predict PV power. In this study, an innovative short-term hybrid prediction model(i.e., HKSL) of PV power is established. The model combines K-means++, optimal similar day approach,and long short-term memory(LSTM) network. Historical power data and meteorological factors are utilized. This model searches for the best similar day based on the results of classifying weather types. Then, the data of similar day are inputted into the LSTM network to predict PV power. The validity of the hybrid model is verified based on the datasets from a PV power station in Shandong Province, China. Four evaluation indices, mean absolute error, root mean square error(RMSE),normalized RMSE, and mean absolute deviation, are employed to assess the performance of the HKSL model. The RMSE of the proposed model compared with those of Elman, LSTM, HSE(hybrid model combining similar day approach and Elman), HSL(hybrid model combining similar day approach and LSTM), and HKSE(hybrid model combining K-means++,similar day approach, and LSTM) decreases by 66.73%, 70.22%, 65.59%, 70.51%, and 18.40%, respectively. This proves the reliability and excellent performance of the proposed hybrid model in predicting power.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.
文摘The dimensionality of data is increasing very rapidly,which creates challenges for most of the current mining and learning algorithms,such as large memory requirements and high computational costs.The literature includes much research on feature selection for supervised learning.However,feature selection for unsupervised learning has only recently been studied.Finding the subset of features in unsupervised learning that enhances the performance is challenging since the clusters are indeterminate.This work proposes a hybrid technique for unsupervised feature selection called GAk-MEANS,which combines the genetic algorithm(GA)approach with the classical k-Means algorithm.In the proposed algorithm,a new fitness func-tion is designed in addition to new smart crossover and mutation operators.The effectiveness of this algorithm is demonstrated on various datasets.Fur-thermore,the performance of GAk-MEANS has been compared with other genetic algorithms,such as the genetic algorithm using the Sammon Error Function and the genetic algorithm using the Sum of Squared Error Function.Additionally,the performance of GAk-MEANS is compared with the state-of-the-art statistical unsupervised feature selection techniques.Experimental results show that GAk-MEANS consistently selects subsets of features that result in better classification accuracy compared to others.In particular,GAk-MEANS is able to significantly reduce the size of the subset of selected features by an average of 86.35%(72%–96.14%),which leads to an increase of the accuracy by an average of 3.78%(1.05%–6.32%)compared to using all features.When compared with the genetic algorithm using the Sammon Error Function,GAk-MEANS is able to reduce the size of the subset of selected features by 41.29%on average,improve the accuracy by 5.37%,and reduce the time by 70.71%.When compared with the genetic algorithm using the Sum of Squared Error Function,GAk-MEANS on average is able to reduce the size of the subset of selected features by 15.91%,and improve the accuracy by 9.81%,but the time is increased by a factor of 3.When compared with the machine-learning based methods,we observed that GAk-MEANS is able to increase the accuracy by 13.67%on average with an 88.76%average increase in time.
文摘Retailing is a dynamic business domain where commodities and goods are sold in small quantities directly to the customers.It deals with the end user customers of a supply-chain network and therefore has to accommodate the needs and desires of a large group of customers over varied utilities.The volume and volatility of the business makes it one of the prospectivefields for analytical study and data modeling.This is also why customer segmentation drives a key role in multiple retail business decisions such as marketing budgeting,customer targeting,customized offers,value proposition etc.The segmentation could be on various aspects such as demographics,historic behavior or preferences based on the use cases.In this paper,historic retail transactional data is used to segment the custo-mers using K-Means clustering and the results are utilized to arrive at a transition matrix which is used to predict the cluster movements over the time period using Markov Model algorithm.This helps in calculating the futuristic value a segment or a customer brings to the business.Strategic marketing designs and budgeting can be implemented using these results.The study is specifically useful for large scale marketing in domains such as e-commerce,insurance or retailers to segment,profile and measure the customer lifecycle value over a short period of time.
基金supported by the National Natural Science Foundation of China(Nos.61174180,U1433125)the Jiangsu Province Science Foundation (No.BK20141413)the Chinese Postdoctoral Science Foundation (No.2014M550291)
文摘A high-precision nominal flight profile,involving controllers′intentions is critical for 4Dtrajectory estimation in modern automatic air traffic control systems.We proposed a novel method to effectively improve the accuracy of the nominal flight profile,including the nominal altitude profile and the speed profile.First,considering the characteristics of trajectory data,we developed an improved K-means algorithm.The approach was to measure the similarity between different altitude profiles by integrating the space warp edit distance algorithm,thereby to acquire several fitted nominal flight altitude profiles.This approach breaks the constraints of traditional K-means algorithms.Second,to eliminate the influence of meteorological factors,we introduced historical gridded binary data to determine the en-route wind speed and temperature via inverse distance weighted interpolation.Finally,we facilitated the true airspeed determined by speed triangle relationships and the calibrated airspeed determined by aircraft data model to extract a more accurate nominal speed profile from each cluster,therefore we could describe the airspeed profiles above and below the airspeed transition altitude,respectively.Our experimental results showed that the proposed method could obtain a highly accurate nominal flight profile,which reflects the actual aircraft flight status.
基金the National Natural Science Foundation of China(Grant Number:81970631 to W.L.).
文摘Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of DN to reduce the prevalence and delay the development of DN.Kidney biopsy is the gold standard for diagnosing DN;however,its invasive character is its primary limitation.The machine learning approach provides a non-invasive and specific criterion for diagnosing DN,although traditional machine learning algorithms need to be improved to enhance diagnostic performance.Methods:We applied high-throughput RNA sequencing to obtain the genes related to DN tubular tissues and normal tubular tissues of mice.Then machine learning algorithms,random forest,LASSO logistic regression,and principal component analysis were used to identify key genes(CES1G,CYP4A14,NDUFA4,ABCC4,ACE).Then,the genetic algorithm-optimized backpropagation neural network(GA-BPNN)was used to improve the DN diagnostic model.Results:The AUC value of the GA-BPNN model in the training dataset was 0.83,and the AUC value of the model in the validation dataset was 0.81,while the AUC values of the SVM model in the training dataset and external validation dataset were 0.756 and 0.650,respectively.Thus,this GA-BPNN gave better values than the traditional SVM model.This diagnosis model may aim for personalized diagnosis and treatment of patients with DN.Immunohistochemical staining further confirmed that the tissue and cell expression of NADH dehydrogenase(ubiquinone)1 alpha subcomplex,4-like 2(NDUFA4L2)in tubular tissue in DN mice were decreased.Conclusion:The GA-BPNN model has better accuracy than the traditional SVM model and may provide an effective tool for diagnosing DN.
基金supported by the Beijing Natural Science Foundation,China(4202066)the Central Public-interest Scientific Institution Basal Research Fund,China(JBYWAII-2020-29 and JBYW-AII-2020-31)+1 种基金the Key Research and Development Program of Hebei Province,China(19227407D)the Technology Innovation Project Fund of Chinese Academy of Agricultural Sciences(CAAS-ASTIP2020-All)。
文摘Fractional vegetation cover(FVC)is an important parameter to measure crop growth.In studies of crop growth monitoring,it is very important to extract FVC quickly and accurately.As the most widely used FVC extraction method,the photographic method has the advantages of simple operation and high extraction accuracy.However,when soil moisture and acquisition times vary,the extraction results are less accurate.To accommodate various conditions of FVC extraction,this study proposes a new FVC extraction method that extracts FVC from a normalized difference vegetation index(NDVI)greyscale image of wheat by using a density peak k-means(DPK-means)algorithm.In this study,Yangfumai 4(YF4)planted in pots and Yangmai 16(Y16)planted in the field were used as the research materials.With a hyperspectral imaging camera mounted on a tripod,ground hyperspectral images of winter wheat under different soil conditions(dry and wet)were collected at 1 m above the potted wheat canopy.Unmanned aerial vehicle(UAV)hyperspectral images of winter wheat at various stages were collected at 50 m above the field wheat canopy by a UAV equipped with a hyperspectral camera.The pixel dichotomy method and DPK-means algorithm were used to classify vegetation pixels and non-vegetation pixels in NDVI greyscale images of wheat,and the extraction effects of the two methods were compared and analysed.The results showed that extraction by pixel dichotomy was influenced by the acquisition conditions and its error distribution was relatively scattered,while the extraction effect of the DPK-means algorithm was less affected by the acquisition conditions and its error distribution was concentrated.The absolute values of error were 0.042 and 0.044,the root mean square errors(RMSE)were 0.028 and 0.030,and the fitting accuracy R2 of the FVC was 0.87 and 0.93,under dry and wet soil conditions and under various time conditions,respectively.This study found that the DPK-means algorithm was capable of achieving more accurate results than the pixel dichotomy method in various soil and time conditions and was an accurate and robust method for FVC extraction.
基金The authors are grateful for the support of the Fundamental Research Funds of Research Institute of Forest New Technology,CAF(CAFYBB2019SY039).
文摘This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS,MCUVE-DS and SiPLS-DS.The Successive Projection Algorithm(SPA),the Monte-Carlo of Uninformative Variable Elimination(MCUVE)and the Synergy Interval Partial Least Squares(SiPLS)algorithms are respectively used to reduce the adverse effects of redundant information in the transmission process of the full spectrum DS algorithm model.These three algorithms can improve model transfer accuracy and efficiency and reduce the manpower and material consumption required for modeling.These results show that the modeling effects of the characteristic wavelengths screened by the SPA,MCUVE and SiPLS algorithms are all greatly improved compared with the full-spectrum modeling,in which the SPA-PLS result in the best prediction with RPDs above 6.5 for both components.The three wavelength selection methods combined with the DS algorithm are used to transfer the models of the two instruments.Among them,the MCUVE combined with the DS algorithm has the best transfer effect.After the model transfer,the RMSEP of lignin is 0.701,and the RMSEP of holocellulose is 0.839,which was improved significantly than the full-spectrum model transfer of 0.759 and 0.918.
基金funded by the National Key Research and Development Program of China(2017YFA0605002,2017YFA0605004,and 2016YFA0601501)the National Natural Science Foundation of China(41961124007,51779145,and 41830863)“Six top talents”in Jiangsu Province(RJFW-031)。
文摘Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization,which is the most widely used approach.Runoff modeling was studied in 38 catchments located in the Yellow–Huai–Hai River Basin(YHHRB).The values of the Nash–Sutcliffe efficiency coefficient(NSE),coefficient of determination(R2),and percent bias(PBIAS)indicated the acceptable performance of the soil and water assessment tool(SWAT)model in the YHHRB.Nine descriptors belonging to the categories of climate,soil,vegetation,and topography were used to express the catchment characteristics related to the hydrological processes.The quantitative relationships between the parameters of the SWAT model and the catchment descriptors were analyzed by six regression-based models,including linear regression(LR)equations,support vector regression(SVR),random forest(RF),k-nearest neighbor(kNN),decision tree(DT),and radial basis function(RBF).Each of the 38 catchments was assumed to be an ungauged catchment in turn.Then,the parameters in each target catchment were estimated by the constructed regression models based on the remaining 37 donor catchments.Furthermore,the similaritybased regionalization scheme was used for comparison with the regression-based approach.The results indicated that the runoff with the highest accuracy was modeled by the SVR-based scheme in ungauged catchments.Compared with the traditional LR-based approach,the accuracy of the runoff modeling in ungauged catchments was improved by the machine learning algorithms because of the outstanding capability to deal with nonlinear relationships.The performances of different approaches were similar in humid regions,while the advantages of the machine learning techniques were more evident in arid regions.When the study area contained nested catchments,the best result was calculated with the similarity-based parameter regionalization scheme because of the high catchment density and short spatial distance.The new findings could improve flood forecasting and water resources planning in regions that lack observed data.
基金supported by the project of science and technology of Henan province under Grant No.222102240024 and 202102210269the Key Scientific Research projects in Colleges and Universities in Henan Grant No.22A460013 and No.22B413004.
文摘In order to solve the defect of large error in current employment quality evaluation,an employment quality evaluation model based on grey correlation degree method and fuzzy C-means(FCM)is proposed.Firstly,it analyzes the related research work of employment quality evaluation,establishes the employment quality evaluation index system,collects the index data,and normalizes the index data;Then,the weight value of employment quality evaluation index is determined by Grey relational analysis method,and some unimportant indexes are removed;Finally,the employment quality evaluation model is established by using fuzzy cluster analysis algorithm,and compared with other employment quality evaluation models.The test results show that the employment quality evaluation accuracy of the design model exceeds 93%,the employment quality evaluation error can meet the requirements of practical application,and the employment quality evaluation effect is much better than the comparison model.The comparison test verifies the superiority of the model.
文摘Clastic rock reservoir is the main reservoir type in the oil and gas field.Archie formula or various conductive models developed on the basis of Archie’s formula are usually used to interpret this kind of reservoir,and the three-water model is widely used as well.However,there are many parameters in the threewater model,and some of them are difficult to determine.Most of the determination methods are based on the statistics of large amount of experimental data.In this study,the authors determine the value of the parameters of the new three-water model based on the nuclear magnetic data and the genetic optimization algorithm.The relative error between the resistivity calculated based on these parameters and the resistivity measured experimentally at 100%water content is 0.9024.The method studied in this paper can be easily applied without much experimental data.It can provide reference for other regions to determine the parameters of the new three-water model.
文摘This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.