Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently i...Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.展开更多
In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection m...In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection method called recursive feature elimination based on ridge regression(Ridge-RFE)for the original spectral data is recommended to make full use of the valid information of spectra.In the Ridge-RFE method,the absolute value of the ridge regression coefficient was used as a criterion to screen spectral characteristic,the feature with the absolute value of minimum weight in the input subset features was removed by recursive feature elimination(RFE),and the selected features were used as inputs of the partial least squares regression(PLS)model.The Ridge-RFE method based PLS model was used to measure the Fe,Si,Mg,Cu,Zn and Mn for 51 aluminum alloy samples,and the results showed that the root mean square error of prediction decreased greatly compared to the PLS model with full spectrum as input.The overall results demonstrate that the Ridge-RFE method is more efficient to extract the redundant features,make PLS model for better quantitative analysis results and improve model generalization ability.展开更多
In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different f...In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics.While the first experiments directly used the own stock features as the model inputs,the second experiments utilized reduced stock features through Variational AutoEncoders(VAE).In the last experiments,in order to grasp the effects of the other banking stocks on individual stock performance,the features belonging to other stocks were also given as inputs to our models.While combining other stock features was done for both own(named as allstock_own)and VAE-reduced(named as allstock_VAE)stock features,the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination.As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model,the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675.Although the classification results achieved with both feature types was close,allstock_VAE achieved these results using nearly 16.67%less features compared to allstock_own.When all experimental results were examined,it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features.It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.展开更多
Multi-source domain adaptation utilizes multiple source domains to learn the knowledge and transfers it to an unlabeled target domain.To address the problem,most of the existing methods aim to minimize the domain shif...Multi-source domain adaptation utilizes multiple source domains to learn the knowledge and transfers it to an unlabeled target domain.To address the problem,most of the existing methods aim to minimize the domain shift by auxiliary distribution alignment objectives,which reduces the effect of domain-specific features.However,without explicitly modeling the domain-specific features,it is not easy to guarantee that the domain-invariant representation extracted from input domains contains domain-specific information as few as possible.In this work,we present a different perspective on MSDA,which employs the idea of feature elimination to reduce the influence of domain-specific features.We design two different ways to extract domain-specific features and total features and construct the domain-invariant representations by eliminating the domain-specific features from total features.The experimental results on different domain adaptation datasets demonstrate the effectiveness of our method and the generalization ability of our model.展开更多
Field-road segmentation is one of the key tasks in the processing of the trajectory of agricultural machinery.To improve the accuracy of the field-road segmentation,this study proposed an XGBoost model based on dual f...Field-road segmentation is one of the key tasks in the processing of the trajectory of agricultural machinery.To improve the accuracy of the field-road segmentation,this study proposed an XGBoost model based on dual feature extraction and recursive feature elimination called DR-XGBoost.DR-XGBoost takes only a small amount of agricultural machine trajectory features as input.Firstly,the model adopted the dual feature extraction method we designed to rapidly expand the number of features and then adequately extract local trajectory features by the time window and feature extraction operator.Secondly,the model applies the recursive feature elimination algorithm to eliminate redundant features from the perspective of the model segmentation effect and thus reduce the computational consumption of model training.Thirdly,it trains XGBoost to complete the trajectory segmentation.To evaluate the effectiveness of DR-XGBoost,we conducted a series of experiments on a real trajectory dataset of agricultural machines.The model achieves a 98.2%Macro-F1 score on the dataset,which is 10.9%higher than the previous state-of-art.The proposal of DR-XGBoost fills the knowledge gap of trajectory feature extraction for agricultural machinery and provides a reasonable and effective feature selection scheme for the field-road segmentation problem.展开更多
Artificial intelligence(AI)technology has become integral in the realm of medicine and healthcare,particularly in human activity recognition(HAR)applications such as fitness and rehabilitation tracking.This study intr...Artificial intelligence(AI)technology has become integral in the realm of medicine and healthcare,particularly in human activity recognition(HAR)applications such as fitness and rehabilitation tracking.This study introduces a robust coupling analysis framework that integrates four AI-enabled models,combining both machine learning(ML)and deep learning(DL)approaches to evaluate their effectiveness in HAR.The analytical dataset comprises 561 features sourced from the UCI-HAR database,forming the foundation for training the models.Additionally,the MHEALTH database is employed to replicate the modeling process for comparative purposes,while inclusion of the WISDM database,renowned for its challenging features,supports the framework’s resilience and adaptability.The ML-based models employ the methodologies including adaptive neuro-fuzzy inference system(ANFIS),support vector machine(SVM),and random forest(RF),for data training.In contrast,a DL-based model utilizes one-dimensional convolution neural network(1dCNN)to automate feature extraction.Furthermore,the recursive feature elimination(RFE)algorithm,which drives an ML-based estimator to eliminate low-participation features,helps identify the optimal features for enhancing model performance.The best accuracies of the ANFIS,SVM,RF,and 1dCNN models with meticulous featuring process achieve around 90%,96%,91%,and 93%,respectively.Comparative analysis using the MHEALTH dataset showcases the 1dCNN model’s remarkable perfect accuracy(100%),while the RF,SVM,and ANFIS models equipped with selected features achieve accuracies of 99.8%,99.7%,and 96.5%,respectively.Finally,when applied to the WISDM dataset,the DL-based and ML-based models attain accuracies of 91.4%and 87.3%,respectively,aligning with prior research findings.In conclusion,the proposed framework yields HAR models with commendable performance metrics,exhibiting its suitability for integration into the healthcare services system through AI-driven applications.展开更多
This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected featur...This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified.Initially,the analysis features were extracted using Cuckoo Sandbox,an open-source malware analysis tool,then the features were divided into five categories using the extracted information.The 804 extracted features were reduced by 70%after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination.Next,these important features were analyzed.The level of contribution from each one was assessed by the Random Forest classifier method.The results showed that System call features were mostly allocated.At the end,it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available.These were the Trojan,Adware,Downloader,and Backdoor malware.展开更多
Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network...Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure.展开更多
The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models.For this,a landslide inventory map was created with 406 historical ...The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models.For this,a landslide inventory map was created with 406 historical landslides and 2030 non-landslide points,which was randomly divided into two datasets for model training(70%)and model testing(30%).22 factors were initially selected to establish a landslide factor database.We applied the GeoDetector and recursive feature elimination method(RFE)to address factor optimization to reduce information redundancy and collinearity in the data.Thereafter,the frequency ratio method,multicollinearity test,and interactive detector were used to analyze and evaluate the optimized factors.Subsequently,the random forest(RF)model was used to create a landslide susceptibility map with original and optimized factors.The resultant hybrid models GeoDetector-RF and RFE-RF were evaluated and compared by the area under the receiver operating characteristic curve(AUC)and accuracy.The accuracy of the two hybrid models(0.868 for GeoDetector-RF and 0.869 for RFE-RF)were higher than that of the RF model(0.860),indicating that the hybrid models with factor optimization have high reliability and predictability.Both RFE-RF GeoDetector-RF had higher AUC values,respectively 0.863 and 0.860,than RF(0.853).These results confirm the ability of factor optimization methods to improve the performance of landslide susceptibility models.展开更多
BACKGROUND Despite advancements in operative technique and improvements in postoperative managements,postoperative pancreatic fistula(POPF)is a life-threatening complication following pancreatoduodenectomy(PD).There a...BACKGROUND Despite advancements in operative technique and improvements in postoperative managements,postoperative pancreatic fistula(POPF)is a life-threatening complication following pancreatoduodenectomy(PD).There are some reports to predict POPF preoperatively or intraoperatively,but the accuracy of those is questionable.Artificial intelligence(AI)technology is being actively used in the medical field,but few studies have reported applying it to outcomes after PD.AIM To develop a risk prediction platform for POPF using an AI model.METHODS Medical records were reviewed from 1769 patients at Samsung Medical Center who underwent PD from 2007 to 2016.A total of 38 variables were inserted into AI-driven algorithms.The algorithms tested to make the risk prediction platform were random forest(RF)and a neural network(NN)with or without recursive feature elimination(RFE).The median imputation method was used for missing values.The area under the curve(AUC)was calculated to examine the discriminative power of algorithm for POPF prediction.RESULTS The number of POPFs was 221(12.5%)according to the International Study Group of Pancreatic Fistula definition 2016.After median imputation,AUCs using 38 variables were 0.68±0.02 with RF and 0.71±0.02 with NN.The maximal AUC using NN with RFE was 0.74.Sixteen risk factors for POPF were identified by AI algorithm:Pancreatic duct diameter,body mass index,preoperative serum albumin,lipase level,amount of intraoperative fluid infusion,age,platelet count,extrapancreatic location of tumor,combined venous resection,co-existing pancreatitis,neoadjuvant radiotherapy,American Society of Anesthesiologists’score,sex,soft texture of the pancreas,underlying heart disease,and preoperative endoscopic biliary decompression.We developed a web-based POPF prediction platform,and this application is freely available at http://popfrisk.smchbp.org.CONCLUSION This study is the first to predict POPF with multiple risk factors using AI.This platform is reliable(AUC 0.74),so it could be used to select patients who need especially intense therapy and to preoperatively establish an effective treatment strategy.展开更多
Electroencephalographic(EEG)-based emotion recognition has received increasing attention in the field of human-computer interaction(HCI)recently,there however remains a number of challenges in building a generalized e...Electroencephalographic(EEG)-based emotion recognition has received increasing attention in the field of human-computer interaction(HCI)recently,there however remains a number of challenges in building a generalized emotion recognition model,one of which includes the difficulty of an EEG-based emotion classifier trained on a specific task to handle other tasks.Lit-tle attention has been paid to this issue.The current study is to determine the feasibility of coping with this challenge using feature selection.12 healthy volunteers were emotionally elicited when conducting picture induced and videoinduced tasks.Firstly,support vector machine(SVM)classifier was examined under within-task conditions(trained and tested on the same task)and cross-task conditions(trained on one task and tested on another task)for pictureinduced and videoinduced tasks.The within-task classification performed fairly well(classification accuracy:51.6%for picture task and 94.4%for video task).Cross-task classification,however,deteriorated to low levels(around 44%).Trained and tested with the most robust feature subset selected by SVM-recursive feature elimination(RFE),the performance of cross-task classifier was significantly improved to above 68%.These results suggest that cross-task emotion recognition is feasible with proper methods and bring EEG-based emotion recognition models closer to being able to discriminate emotion states for any tasks.展开更多
At present,the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level.Accurate prediction of diabetes patients is an important research area.Many researchers have proposed tech...At present,the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level.Accurate prediction of diabetes patients is an important research area.Many researchers have proposed techniques to predict this disease through data mining and machine learning methods.In prediction,feature selection is a key concept in preprocessing.Thus,the features that are relevant to the disease are used for prediction.This condition improves the prediction accuracy.Selecting the right features in the whole feature set is a complicated process,and many researchers are concentrating on it to produce a predictive model with high accuracy.In this work,a wrapper-based feature selection method called recursive feature elimination is combined with ridge regression(L2)to form a hybrid L2 regulated feature selection algorithm for overcoming the overfitting problem of data set.Overfitting is a major problem in feature selection,where the new data are unfit to the model because the training data are small.Ridge regression is mainly used to overcome the overfitting problem.The features are selected by using the proposed feature selection method,and random forest classifier is used to classify the data on the basis of the selected features.This work uses the Pima Indians Diabetes data set,and the evaluated results are compared with the existing algorithms to prove the accuracy of the proposed algorithm.The accuracy of the proposed algorithm in predicting diabetes is 100%,and its area under the curve is 97%.The proposed algorithm outperforms existing algorithms.展开更多
基金Supported by China 973 Program (No.2002CB312200), the National Natural Science Foundation of China (No.60574019 and No.60474045), the Key Technologies R&D Program of Zhejiang Province (No.2005C21087) and the Academician Foundation of Zhejiang Province (No.2005A1001-13).
文摘Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.
基金supported by National Key Research and Development Program of China(No.2016YFF0102502)the Key Research Program of Frontier Sciences,CAS(No.QYZDJ-SSW-JSC037)the Youth Innovation Promotion Association,CAS,Liao Ning Revitalization Talents Program(No.XLYC1807110)。
文摘In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection method called recursive feature elimination based on ridge regression(Ridge-RFE)for the original spectral data is recommended to make full use of the valid information of spectra.In the Ridge-RFE method,the absolute value of the ridge regression coefficient was used as a criterion to screen spectral characteristic,the feature with the absolute value of minimum weight in the input subset features was removed by recursive feature elimination(RFE),and the selected features were used as inputs of the partial least squares regression(PLS)model.The Ridge-RFE method based PLS model was used to measure the Fe,Si,Mg,Cu,Zn and Mn for 51 aluminum alloy samples,and the results showed that the root mean square error of prediction decreased greatly compared to the PLS model with full spectrum as input.The overall results demonstrate that the Ridge-RFE method is more efficient to extract the redundant features,make PLS model for better quantitative analysis results and improve model generalization ability.
文摘In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics.While the first experiments directly used the own stock features as the model inputs,the second experiments utilized reduced stock features through Variational AutoEncoders(VAE).In the last experiments,in order to grasp the effects of the other banking stocks on individual stock performance,the features belonging to other stocks were also given as inputs to our models.While combining other stock features was done for both own(named as allstock_own)and VAE-reduced(named as allstock_VAE)stock features,the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination.As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model,the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675.Although the classification results achieved with both feature types was close,allstock_VAE achieved these results using nearly 16.67%less features compared to allstock_own.When all experimental results were examined,it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features.It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.
基金supported by the National Natural Science Foundation of China(NSFC)(Grant Nos.61876130 and 61932009).
文摘Multi-source domain adaptation utilizes multiple source domains to learn the knowledge and transfers it to an unlabeled target domain.To address the problem,most of the existing methods aim to minimize the domain shift by auxiliary distribution alignment objectives,which reduces the effect of domain-specific features.However,without explicitly modeling the domain-specific features,it is not easy to guarantee that the domain-invariant representation extracted from input domains contains domain-specific information as few as possible.In this work,we present a different perspective on MSDA,which employs the idea of feature elimination to reduce the influence of domain-specific features.We design two different ways to extract domain-specific features and total features and construct the domain-invariant representations by eliminating the domain-specific features from total features.The experimental results on different domain adaptation datasets demonstrate the effectiveness of our method and the generalization ability of our model.
基金This work was financially supported by the National Key Research and Development Program of China(Grant No.2021YFB3901300)the National Precision Agriculture Application Project(Grant No.JZNYYY001)National Innovation Training Project for University in China(Grant No.202310019034).
文摘Field-road segmentation is one of the key tasks in the processing of the trajectory of agricultural machinery.To improve the accuracy of the field-road segmentation,this study proposed an XGBoost model based on dual feature extraction and recursive feature elimination called DR-XGBoost.DR-XGBoost takes only a small amount of agricultural machine trajectory features as input.Firstly,the model adopted the dual feature extraction method we designed to rapidly expand the number of features and then adequately extract local trajectory features by the time window and feature extraction operator.Secondly,the model applies the recursive feature elimination algorithm to eliminate redundant features from the perspective of the model segmentation effect and thus reduce the computational consumption of model training.Thirdly,it trains XGBoost to complete the trajectory segmentation.To evaluate the effectiveness of DR-XGBoost,we conducted a series of experiments on a real trajectory dataset of agricultural machines.The model achieves a 98.2%Macro-F1 score on the dataset,which is 10.9%higher than the previous state-of-art.The proposal of DR-XGBoost fills the knowledge gap of trajectory feature extraction for agricultural machinery and provides a reasonable and effective feature selection scheme for the field-road segmentation problem.
基金funded by the National Science and Technology Council,Taiwan(Grant No.NSTC 112-2121-M-039-001)by China Medical University(Grant No.CMU112-MF-79).
文摘Artificial intelligence(AI)technology has become integral in the realm of medicine and healthcare,particularly in human activity recognition(HAR)applications such as fitness and rehabilitation tracking.This study introduces a robust coupling analysis framework that integrates four AI-enabled models,combining both machine learning(ML)and deep learning(DL)approaches to evaluate their effectiveness in HAR.The analytical dataset comprises 561 features sourced from the UCI-HAR database,forming the foundation for training the models.Additionally,the MHEALTH database is employed to replicate the modeling process for comparative purposes,while inclusion of the WISDM database,renowned for its challenging features,supports the framework’s resilience and adaptability.The ML-based models employ the methodologies including adaptive neuro-fuzzy inference system(ANFIS),support vector machine(SVM),and random forest(RF),for data training.In contrast,a DL-based model utilizes one-dimensional convolution neural network(1dCNN)to automate feature extraction.Furthermore,the recursive feature elimination(RFE)algorithm,which drives an ML-based estimator to eliminate low-participation features,helps identify the optimal features for enhancing model performance.The best accuracies of the ANFIS,SVM,RF,and 1dCNN models with meticulous featuring process achieve around 90%,96%,91%,and 93%,respectively.Comparative analysis using the MHEALTH dataset showcases the 1dCNN model’s remarkable perfect accuracy(100%),while the RF,SVM,and ANFIS models equipped with selected features achieve accuracies of 99.8%,99.7%,and 96.5%,respectively.Finally,when applied to the WISDM dataset,the DL-based and ML-based models attain accuracies of 91.4%and 87.3%,respectively,aligning with prior research findings.In conclusion,the proposed framework yields HAR models with commendable performance metrics,exhibiting its suitability for integration into the healthcare services system through AI-driven applications.
基金supported by the Research Program through the National Research Foundation of Korea,NRF-2018R1D1A1B07050864.
文摘This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified.Initially,the analysis features were extracted using Cuckoo Sandbox,an open-source malware analysis tool,then the features were divided into five categories using the extracted information.The 804 extracted features were reduced by 70%after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination.Next,these important features were analyzed.The level of contribution from each one was assessed by the Random Forest classifier method.The results showed that System call features were mostly allocated.At the end,it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available.These were the Trojan,Adware,Downloader,and Backdoor malware.
文摘Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure.
文摘The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models.For this,a landslide inventory map was created with 406 historical landslides and 2030 non-landslide points,which was randomly divided into two datasets for model training(70%)and model testing(30%).22 factors were initially selected to establish a landslide factor database.We applied the GeoDetector and recursive feature elimination method(RFE)to address factor optimization to reduce information redundancy and collinearity in the data.Thereafter,the frequency ratio method,multicollinearity test,and interactive detector were used to analyze and evaluate the optimized factors.Subsequently,the random forest(RF)model was used to create a landslide susceptibility map with original and optimized factors.The resultant hybrid models GeoDetector-RF and RFE-RF were evaluated and compared by the area under the receiver operating characteristic curve(AUC)and accuracy.The accuracy of the two hybrid models(0.868 for GeoDetector-RF and 0.869 for RFE-RF)were higher than that of the RF model(0.860),indicating that the hybrid models with factor optimization have high reliability and predictability.Both RFE-RF GeoDetector-RF had higher AUC values,respectively 0.863 and 0.860,than RF(0.853).These results confirm the ability of factor optimization methods to improve the performance of landslide susceptibility models.
基金Supported by the National Research Foundation of Korea grant funded by the Korea government(Ministry of Science and ICT),No.NRF-2019R1F1A1042156and the Bio&Medical Technology Development Program,No.NRF-2017M3A9E1064784.
文摘BACKGROUND Despite advancements in operative technique and improvements in postoperative managements,postoperative pancreatic fistula(POPF)is a life-threatening complication following pancreatoduodenectomy(PD).There are some reports to predict POPF preoperatively or intraoperatively,but the accuracy of those is questionable.Artificial intelligence(AI)technology is being actively used in the medical field,but few studies have reported applying it to outcomes after PD.AIM To develop a risk prediction platform for POPF using an AI model.METHODS Medical records were reviewed from 1769 patients at Samsung Medical Center who underwent PD from 2007 to 2016.A total of 38 variables were inserted into AI-driven algorithms.The algorithms tested to make the risk prediction platform were random forest(RF)and a neural network(NN)with or without recursive feature elimination(RFE).The median imputation method was used for missing values.The area under the curve(AUC)was calculated to examine the discriminative power of algorithm for POPF prediction.RESULTS The number of POPFs was 221(12.5%)according to the International Study Group of Pancreatic Fistula definition 2016.After median imputation,AUCs using 38 variables were 0.68±0.02 with RF and 0.71±0.02 with NN.The maximal AUC using NN with RFE was 0.74.Sixteen risk factors for POPF were identified by AI algorithm:Pancreatic duct diameter,body mass index,preoperative serum albumin,lipase level,amount of intraoperative fluid infusion,age,platelet count,extrapancreatic location of tumor,combined venous resection,co-existing pancreatitis,neoadjuvant radiotherapy,American Society of Anesthesiologists’score,sex,soft texture of the pancreas,underlying heart disease,and preoperative endoscopic biliary decompression.We developed a web-based POPF prediction platform,and this application is freely available at http://popfrisk.smchbp.org.CONCLUSION This study is the first to predict POPF with multiple risk factors using AI.This platform is reliable(AUC 0.74),so it could be used to select patients who need especially intense therapy and to preoperatively establish an effective treatment strategy.
基金supported by National Natural Science Foundation of China(No.81222021,61172008,81171423,81127003,)National Key Technology R&D Program of the Ministry of Science and Technology of China(No.2012BAI34B02)Program for New Century Excellent Talents in University of the Ministry of Education of China(No.NCET-10-0618).
文摘Electroencephalographic(EEG)-based emotion recognition has received increasing attention in the field of human-computer interaction(HCI)recently,there however remains a number of challenges in building a generalized emotion recognition model,one of which includes the difficulty of an EEG-based emotion classifier trained on a specific task to handle other tasks.Lit-tle attention has been paid to this issue.The current study is to determine the feasibility of coping with this challenge using feature selection.12 healthy volunteers were emotionally elicited when conducting picture induced and videoinduced tasks.Firstly,support vector machine(SVM)classifier was examined under within-task conditions(trained and tested on the same task)and cross-task conditions(trained on one task and tested on another task)for pictureinduced and videoinduced tasks.The within-task classification performed fairly well(classification accuracy:51.6%for picture task and 94.4%for video task).Cross-task classification,however,deteriorated to low levels(around 44%).Trained and tested with the most robust feature subset selected by SVM-recursive feature elimination(RFE),the performance of cross-task classifier was significantly improved to above 68%.These results suggest that cross-task emotion recognition is feasible with proper methods and bring EEG-based emotion recognition models closer to being able to discriminate emotion states for any tasks.
文摘At present,the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level.Accurate prediction of diabetes patients is an important research area.Many researchers have proposed techniques to predict this disease through data mining and machine learning methods.In prediction,feature selection is a key concept in preprocessing.Thus,the features that are relevant to the disease are used for prediction.This condition improves the prediction accuracy.Selecting the right features in the whole feature set is a complicated process,and many researchers are concentrating on it to produce a predictive model with high accuracy.In this work,a wrapper-based feature selection method called recursive feature elimination is combined with ridge regression(L2)to form a hybrid L2 regulated feature selection algorithm for overcoming the overfitting problem of data set.Overfitting is a major problem in feature selection,where the new data are unfit to the model because the training data are small.Ridge regression is mainly used to overcome the overfitting problem.The features are selected by using the proposed feature selection method,and random forest classifier is used to classify the data on the basis of the selected features.This work uses the Pima Indians Diabetes data set,and the evaluated results are compared with the existing algorithms to prove the accuracy of the proposed algorithm.The accuracy of the proposed algorithm in predicting diabetes is 100%,and its area under the curve is 97%.The proposed algorithm outperforms existing algorithms.