The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques we...The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.展开更多
Ganoderma lucidum(G. lucidum) spores as a valuable Chinese herbal medicine have vast marketable prospect for its bioactivities and medicinal efficacy. This study aims at the development of an effective and simple anal...Ganoderma lucidum(G. lucidum) spores as a valuable Chinese herbal medicine have vast marketable prospect for its bioactivities and medicinal efficacy. This study aims at the development of an effective and simple analytical method to distinguish G. lucidum spores from its fruiting body, which is of essential importance for the quality control and fast discrimination of raw materials of Chinese herbal medicine. Attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy combined with the appropriate chemometric methods including penalized discriminant analysis, principal component discriminant analysis and partial least squares discriminant analysis has been proven to be a rapid and powerful tool for discrimination of G. lucidum spores and its fruiting body with classification accuracy of 99%. The model leads to a well-performed selection of informative spectral absorption bands which improve the classification accuracy, reduce the model complexity and enhance the quantitative interpretations of the chemical constituents of G. lucidum spores regarding its anticancer effects.展开更多
In many fields such as signal processing,machine learning,pattern recognition and data mining,it is common practice to process datasets containing huge numbers of features.In such cases,Feature Selection(FS)is often i...In many fields such as signal processing,machine learning,pattern recognition and data mining,it is common practice to process datasets containing huge numbers of features.In such cases,Feature Selection(FS)is often involved.Meanwhile,owing to their excellent global search ability,evolutionary computation techniques have been widely employed to the FS.So,as a powerful global search method and calculation fast than other EC algorithms,PSO can solve features selection problems well.However,when facing a large number of feature selection,the efficiency of PSO drops significantly.Therefore,plenty of works have been done to improve this situation.Besides,many studies have shown that an appropriate population initialization can effectively help to improve this problem.So,basing on PSO,this paper introduces a new feature selection method with filter-based population.The proposed algorithm uses Principal Component Analysis(PCA)to measure the importance of features first,then based on the sorted feature information,a population initialization method using the threshold selection and the mixed initialization is proposed.The experiments were performed on several datasets and compared to several other related algorithms.Experimental results show that the accuracy of PSO to solve feature selection problems is significantly improved after using proposed method.展开更多
Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the...Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the features of the dataset used to describe the problem. An important objective is therefore the correct selection (and, possibly, reduction of the number) of the most relevant features, which is typically carried out through dimensional reduction tools such as Principal Component Analysis (PCA), which is not linear in the more general case. In this work, an approach to the calculation of the reduced space of the PCA is proposed through the definition and implementation of appropriate models of artificial neural network, which allows to obtain an accurate and at the same time flexible reduction of the dimensionality of the problem.展开更多
Support vector classifier (SVC) has the superior advantages for small sample learning problems with high dimensions, with especially better generalization ability. However there is some redundancy among the high dim...Support vector classifier (SVC) has the superior advantages for small sample learning problems with high dimensions, with especially better generalization ability. However there is some redundancy among the high dimensions of the original samples and the main features of the samples may be picked up first to improve the performance of SVC. A principal component analysis (PCA) is employed to reduce the feature dimensions of the original samples and the pre-selected main features efficiently, and an SVC is constructed in the selected feature space to improve the learning speed and identification rate of SVC. Furthermore, a heuristic genetic algorithm-based automatic model selection is proposed to determine the hyperparameters of SVC to evaluate the performance of the learning machines. Experiments performed on the Heart and Adult benchmark data sets demonstrate that the proposed PCA-based SVC not only reduces the test time drastically, but also improves the identify rates effectively.展开更多
The eigenface method that uses principal component analysis(PCA) has been the standard and popular method used in face recognition.This paper presents a PCA-memetic algorithm(PCA-MA) approach for feature selection.PCA...The eigenface method that uses principal component analysis(PCA) has been the standard and popular method used in face recognition.This paper presents a PCA-memetic algorithm(PCA-MA) approach for feature selection.PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection.Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier.It was found that as far as the recognition rate is concerned,PCA-MA completely outperforms the eigenface method.We compared the performance of PCA extended with genetic algorithm(PCA-GA) with our proposed PCA-MA method.The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method.We further extended linear discriminant analysis(LDA) and kernel principal component analysis(KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features.This paper also compares the performance of PCA-MA,LDA-MA and KPCA-MA approaches.展开更多
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant no.2019QZKK0904)Natural Science Foundation of Hebei Province(Grant no.D2022403032)S&T Program of Hebei(Grant no.E2021403001).
文摘The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.
文摘Ganoderma lucidum(G. lucidum) spores as a valuable Chinese herbal medicine have vast marketable prospect for its bioactivities and medicinal efficacy. This study aims at the development of an effective and simple analytical method to distinguish G. lucidum spores from its fruiting body, which is of essential importance for the quality control and fast discrimination of raw materials of Chinese herbal medicine. Attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy combined with the appropriate chemometric methods including penalized discriminant analysis, principal component discriminant analysis and partial least squares discriminant analysis has been proven to be a rapid and powerful tool for discrimination of G. lucidum spores and its fruiting body with classification accuracy of 99%. The model leads to a well-performed selection of informative spectral absorption bands which improve the classification accuracy, reduce the model complexity and enhance the quantitative interpretations of the chemical constituents of G. lucidum spores regarding its anticancer effects.
基金This work is supported by National Natural Science Foundation of China(Grant Nos.61876089,61403206)by Science and Technology Program of Ministry of Housing and Urban-Rural Development(2019-K-141)+1 种基金by Entrepreneurial Team of Sponge City(2017R02002)by Innovation and entrepreneurship training program for College Students。
文摘In many fields such as signal processing,machine learning,pattern recognition and data mining,it is common practice to process datasets containing huge numbers of features.In such cases,Feature Selection(FS)is often involved.Meanwhile,owing to their excellent global search ability,evolutionary computation techniques have been widely employed to the FS.So,as a powerful global search method and calculation fast than other EC algorithms,PSO can solve features selection problems well.However,when facing a large number of feature selection,the efficiency of PSO drops significantly.Therefore,plenty of works have been done to improve this situation.Besides,many studies have shown that an appropriate population initialization can effectively help to improve this problem.So,basing on PSO,this paper introduces a new feature selection method with filter-based population.The proposed algorithm uses Principal Component Analysis(PCA)to measure the importance of features first,then based on the sorted feature information,a population initialization method using the threshold selection and the mixed initialization is proposed.The experiments were performed on several datasets and compared to several other related algorithms.Experimental results show that the accuracy of PSO to solve feature selection problems is significantly improved after using proposed method.
文摘Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the features of the dataset used to describe the problem. An important objective is therefore the correct selection (and, possibly, reduction of the number) of the most relevant features, which is typically carried out through dimensional reduction tools such as Principal Component Analysis (PCA), which is not linear in the more general case. In this work, an approach to the calculation of the reduced space of the PCA is proposed through the definition and implementation of appropriate models of artificial neural network, which allows to obtain an accurate and at the same time flexible reduction of the dimensionality of the problem.
基金the National Natural Science of China (50675167)a Foundation for the Author of National Excellent Doctoral Dissertation of China(200535)
文摘Support vector classifier (SVC) has the superior advantages for small sample learning problems with high dimensions, with especially better generalization ability. However there is some redundancy among the high dimensions of the original samples and the main features of the samples may be picked up first to improve the performance of SVC. A principal component analysis (PCA) is employed to reduce the feature dimensions of the original samples and the pre-selected main features efficiently, and an SVC is constructed in the selected feature space to improve the learning speed and identification rate of SVC. Furthermore, a heuristic genetic algorithm-based automatic model selection is proposed to determine the hyperparameters of SVC to evaluate the performance of the learning machines. Experiments performed on the Heart and Adult benchmark data sets demonstrate that the proposed PCA-based SVC not only reduces the test time drastically, but also improves the identify rates effectively.
文摘The eigenface method that uses principal component analysis(PCA) has been the standard and popular method used in face recognition.This paper presents a PCA-memetic algorithm(PCA-MA) approach for feature selection.PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection.Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier.It was found that as far as the recognition rate is concerned,PCA-MA completely outperforms the eigenface method.We compared the performance of PCA extended with genetic algorithm(PCA-GA) with our proposed PCA-MA method.The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method.We further extended linear discriminant analysis(LDA) and kernel principal component analysis(KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features.This paper also compares the performance of PCA-MA,LDA-MA and KPCA-MA approaches.
文摘传统的基于主成分的冗余变量筛选算法最终计算所得的关键变量筛选指标需要结合专家经验进行判定,具有人为主观性,使得模型预测结果不稳定。因此,文中提出了一种结合主成分与熵权的关键变量筛选算法(Key Variable Screening Algorithm Combining Principal Component and Entropy Weight,KVSA-PCA-EP)。该算法,首先通过传统的基于主成分的冗余变量筛选算法计算第一个关键变量筛选指标;然后,通过各原始变量的方差和目标变量的熵值计算第二个关键变量筛选指标;最后,以第二个关键变量筛选指标与第一个关键变量筛选指标的比值作为最终的关键变量筛选指标。文中通过在公开数据集METERC上的实验,并与传统的基于主成分的冗余变量筛选算法作对比,F1分数方面提高约5%,充分验证了提出算法的优越性。