To solve the multi-class fault diagnosis tasks, decision tree support vector machine (DTSVM), which combines SVM and decision tree using the concept of dichotomy, is proposed. Since the classification performance of...To solve the multi-class fault diagnosis tasks, decision tree support vector machine (DTSVM), which combines SVM and decision tree using the concept of dichotomy, is proposed. Since the classification performance of DTSVM highly depends on its structure, to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes, genetic algorithm is introduced into the formation of decision tree, so that the most separable classes would be separated at each node of decisions tree. Numerical simulations conducted on three datasets compared with "one-against-all" and "one-against-one" demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.展开更多
Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the verac...Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.展开更多
This paper presents the fault diagnosis of face milling tool based on machine learning approach.While machining,spindle vibration signals in feed direction under healthy and faulty conditions of the milling tool are a...This paper presents the fault diagnosis of face milling tool based on machine learning approach.While machining,spindle vibration signals in feed direction under healthy and faulty conditions of the milling tool are acquired.A set of discrete wavelet features is extracted from the vibration signals using discrete wavelet transform(DWT)technique.The decision tree technique is used to select significant features out of all extracted wavelet features.C-support vector classification(C-SVC)andν-support vector classification(ν-SVC)models with different kernel functions of support vector machine(SVM)are used to study and classify the tool condition based on selected features.From the results obtained,C-SVC is the best model thanν-SVC and it can be able to give 94.5%classification accuracy for face milling of special steel alloy 42CrMo4.展开更多
Renewable energy has garnered attention due to the need for sustainable energy sources.Wind power has emerged as an alternative that has contributed to the transition towards cleaner energy.As the importance of wind e...Renewable energy has garnered attention due to the need for sustainable energy sources.Wind power has emerged as an alternative that has contributed to the transition towards cleaner energy.As the importance of wind energy grows,it can be crucial to provide forecasts that optimize its performance potential.Artificial intelligence(AI)methods have risen in prominence due to how well they can handle complicated systems while enhancing the accuracy of prediction.This study explored the area of AI to predict wind-energy production at a wind farm in Yalova,Turkey,using four different AI approaches:support vector machines(SVMs),decision trees,adaptive neuro-fuzzy inference systems(ANFIS)and artificial neural networks(ANNs).Wind speed and direction were considered as essential input parameters,with wind energy as the target parameter,and models are thoroughly evaluated using metrics such as the mean absolute percentage error(MAPE),coefficient of determination(R~2),and mean absolute error(MAE).The findings accentuate the superior performance of the SVM,which delivered the lowest MAPE(2.42%),the highest R~2(0.95),and the lowest MAE(71.21%)compared with actual values,while ANFIS was less effective in this context.The main aim of this comparative analysis was to rank the models to move to the next step in improving the least efficient methods by combining them with optimization algorithms,such as metaheuristic algorithms.展开更多
Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learni...Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learning models to predict heart failure.The fundamental concept is to compare the correctness of various Machine Learning(ML)algorithms and boost algorithms to improve models’accuracy for prediction.Some supervised algorithms like K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF),Logistic Regression(LR)are considered to achieve the best results.Some boosting algorithms like Extreme Gradient Boosting(XGBoost)and Cat-Boost are also used to improve the prediction using Artificial Neural Networks(ANN).This research also focuses on data visualization to identify patterns,trends,and outliers in a massive data set.Python and Scikit-learns are used for ML.Tensor Flow and Keras,along with Python,are used for ANN model train-ing.The DT and RF algorithms achieved the highest accuracy of 95%among the classifiers.Meanwhile,KNN obtained a second height accuracy of 93.33%.XGBoost had a gratified accuracy of 91.67%,SVM,CATBoost,and ANN had an accuracy of 90%,and LR had 88.33%accuracy.展开更多
The ongoing effort to create methods for detecting and quantifying fatigue damage is motivated by the high levels of uncertainty in present fatigue-life prediction approaches and the frequently catastrophic nature of ...The ongoing effort to create methods for detecting and quantifying fatigue damage is motivated by the high levels of uncertainty in present fatigue-life prediction approaches and the frequently catastrophic nature of fatigue failure.The fatigue life of high strength aluminum alloy 2090-T83 is predicted in this study using a variety of artificial intelligence and machine learning techniques for constant amplitude and negative stress ratios(R?1).Artificial neural networks(ANN),adaptive neuro-fuzzy inference systems(ANFIS),support-vector machines(SVM),a random forest model(RF),and an extreme-gradient tree-boosting model(XGB)are trained using numerical and experimental input data obtained from fatigue tests based on a relatively low number of stress measurements.In particular,the coefficients of the traditional force law formula are found using relevant numerical methods.It is shown that,in comparison to traditional approaches,the neural network and neuro-fuzzy models produce better results,with the neural network models trained using the boosting iterations technique providing the best performances.Building strong models from weak models,XGB helps to predict fatigue life by reducing model partiality and variation in supervised learning.Fuzzy neural models can be used to predict the fatigue life of alloys more accurately than neural networks and traditional methods.展开更多
Landslides are abundant in mountainous regions.They are responsible for substantial damages and losses in those areas.The A1 Highway,which is an important road in Algeria,was sometimes constructed in mountainous and/o...Landslides are abundant in mountainous regions.They are responsible for substantial damages and losses in those areas.The A1 Highway,which is an important road in Algeria,was sometimes constructed in mountainous and/or semi-mountainous areas.Previous studies of landslide susceptibility mapping conducted near this road using statistical and expert methods have yielded ordinary results.In this research,we are interested in how do machine learning techniques help in increasing accuracy of landslide susceptibility maps in the vicinity of the A1 Highway corridor.To do this,an important section at Ain Bouziane(NE,Algeria) is chosen as a case study to evaluate the landslide susceptibility using three different machine learning methods,namely,random forest(RF),support vector machine(SVM),and boosted regression tree(BRT).First,an inventory map and nine input factors were prepared for landslide susceptibility mapping(LSM) analyses.The three models were constructed to find the most susceptible areas to this phenomenon.The results were assessed by calculating the receiver operating characteristic(ROC) curve,the standard error(Std.error),and the confidence interval(CI) at 95%.The RF model reached the highest predictive accuracy(AUC=97.2%) comparatively to the other models.The outcomes of this research proved that the obtained machine learning models had the ability to predict future landslide locations in this important road section.In addition,their application gives an improvement of the accuracy of LSMs near the road corridor.The machine learning models may become an important prediction tool that will identify landslide alleviation actions.展开更多
Posterior probability support vector machines (PPSVMs) prove robust against noises and outliers and need fewer storage support vectors (SVs). Gonen et al. (2008) extended PPSVMs to a multiclass case by both single-mac...Posterior probability support vector machines (PPSVMs) prove robust against noises and outliers and need fewer storage support vectors (SVs). Gonen et al. (2008) extended PPSVMs to a multiclass case by both single-machine and multimachine approaches. However, these extensions suffer from low classification efficiency, high computational burden, and more importantly, unclassifiable regions. To achieve higher classification efficiency and accuracy with fewer SVs, a binary tree of PPSVMs for the multiclass classification problem is proposed in this letter. Moreover, a Fisher ratio separability measure is adopted to determine the tree structure. Several experiments on handwritten recognition datasets are included to illustrate the proposed approach. Specifically, the Fisher ratio separability accelerated binary tree of PPSVMs obtains overall test accuracy, if not higher than, at least comparable to those of other multiclass algorithms, while using significantly fewer SVs and much less test time.展开更多
This article aims to assess health habits,safety behaviors,and anxiety factors in the community during the novel coronavirus disease(COVID-19)pandemic in Saudi Arabia based on primary data collected through a question...This article aims to assess health habits,safety behaviors,and anxiety factors in the community during the novel coronavirus disease(COVID-19)pandemic in Saudi Arabia based on primary data collected through a questionnaire with 320 respondents.In other words,this paper aims to provide empirical insights into the correlation and the correspondence between sociodemographic factors(gender,nationality,age,citizenship factors,income,and education),and psycho-behavioral effects on individuals in response to the emergence of this new pandemic.To focus on the interaction between these variables and their effects,we suggest different methods of analysis,comprising regression trees and support vector machine regression(SVMR)algorithms.According to the regression tree results,the age variable plays a predominant role in health habits,safety behaviors,and anxiety.The health habit index,which focuses on the extent of behavioral change toward the commitment to use the health and protection methods,is highly affected by gender and age factors.The average monthly income is also a relevant factor but has contrasting effects during the COVID-19 pandemic period.The results of the SVMR model reveal a strong positive effect of income,with R^(2) values of 99.59%,99.93%and 99.88%corresponding to health habits,safety behaviors,and anxiety.展开更多
Background: The accurate estimation of soil nutrient content is particularly important in view of its impact on plant growth and forest regeneration. In order to investigate soil nutrient content and quality for the n...Background: The accurate estimation of soil nutrient content is particularly important in view of its impact on plant growth and forest regeneration. In order to investigate soil nutrient content and quality for the natural regeneration of Dacrydium pectinatum communities in China, designing advanced and accurate estimation methods is necessary.Methods: This study uses machine learning techniques created a series of comprehensive and novel models from which to evaluate soil nutrient content. Soil nutrient evaluation methods were built by using six support vector machines and four artificial neural networks.Results: The generalized regression neural network model was the best artificial neural network evaluation model with the smallest root mean square error(5.1), mean error(-0.85), and mean square prediction error(29). The accuracy rate of the combined k-nearest neighbors(k-NN) local support vector machines model(i.e. k-nearest neighbors-support vector machine(KNNSVM)) for soil nutrient evaluation was high, comparing to the other five partial support vector machines models investigated. The area under curve value of generalized regression neural network(0.6572) was the highest, and the cross-validation result showed that the generalized regression neural network reached 92.5%.Conclusions: Both the KNNSVM and generalized regression neural network models can be effectively used to evaluate soil nutrient content and quality grades in conjunction with appropriate model variables. Developing a new feasible evaluation method to assess soil nutrient quality for Dacrydium pectinatum, results from this study can be used as a reference for the adaptive management of rare and endangered tree species. This study, however, found some uncertainties in data acquisition and model simulations, which will be investigated in upcoming studies.展开更多
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for ...Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.展开更多
Because of the increasing attention on environmental issues, especially air pollution, predicting whether a day is polluted or not is necessary to people’s health. In order to solve this problem, this research is cla...Because of the increasing attention on environmental issues, especially air pollution, predicting whether a day is polluted or not is necessary to people’s health. In order to solve this problem, this research is classifying ground ozone level based on big data and machine learning models, where polluted ozone day has class 1 and non-ozone day has class 0. The dataset used in this research was derived from the UCI Website, containing various environmental factors in Houston, Galveston and Brazoria area that could possibly affect the occurrence of ozone pollution [1]. This dataset is first filled up for further process, next standardized to ensure every feature has the same weight, and then split into training set and testing set. After this, five different machine learning models are used in the prediction of ground ozone level and their final accuracy scores are compared. In conclusion, among Logistic Regression, Decision Tree, Random Forest, AdaBoost, and Support Vector Machine (SVM), the last one has the highest test score of 0.949. This research utilizes relatively simple methods of forecasting and calculates the first accuracy scores in predicting ground ozone level;it can thus be a reference for environmentalists. Moreover, the direct comparison among five different models provides machine learning field an insight to determine the most accurate model. In the future, Neural Network can also be utilized to predict air pollution, and its test scores can be compared with the previous five methods to conclude the accuracy of Neuron Network.展开更多
Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorith...Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorithms,namely Linear Discriminant Analysis(LDA),Support Vector Machines(SVMs),k-nearest neighbor(kNN),Naïve Bayes(NB),Recursive Partitioning and Regression Trees(RPART),and Random Forest(RF),were selected to forecast the timings of barley flowering and maturity based on the Alaska Crop Datasets and climate data from 1991 to 2016 in Fairbanks,Alaska.Among 32 models fit to forecast flowering time,two from LDA,12 from SVMs,four from NB,three from RF outperformed models from other algorithms with the highest accuracy.Models from kNN performed worst to forecast flowering time.Among 32 models fit to forecast maturity time,two models from LDA outperformed the models from other algorithms.Models from kNN and RPART performed worst to forecast maturity time.Models from machine learning methods also provided a variable importance explanation.In this study,four out of six algorithms gave the same variable importance order.Sowing date was the most important variable to forecast flowering but less important variable to forecast maturity.The daily maximum temperature may be more important than daily minimum temperature to fit flowering models while daily minimum temperature may be more important than daily maximum temperature to fit maturity models.The results indicate that models from machine learning provide a promising technique in forecasting the timings of flowering and maturity of barley.展开更多
Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling an...Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.展开更多
In the last decade, a few valuable types of research have been conducted to discriminate fractured zones from non-fractured ones. In this paper, petrophysical and image logs of eight wells were utilized to detect frac...In the last decade, a few valuable types of research have been conducted to discriminate fractured zones from non-fractured ones. In this paper, petrophysical and image logs of eight wells were utilized to detect fractured zones. Decision tree, random forest, support vector machine, and deep learning were four classifiers applied over petrophysical logs and image logs for both training and testing. The output of classifiers was fused by ordered weighted averaging data fusion to achieve more reliable, accurate, and general results. Accuracy of close to 99% has been achieved. This study reports a significant improvement compared to the existing work that has an accuracy of close to 80%.展开更多
Fraud Transactions are haunting the economy of many individuals with several factors across the globe.This research focuses on developing a mechanism by integrating various optimized machine-learning algorithms to ens...Fraud Transactions are haunting the economy of many individuals with several factors across the globe.This research focuses on developing a mechanism by integrating various optimized machine-learning algorithms to ensure the security and integrity of digital transactions.This research proposes a novel methodology through three stages.Firstly,Synthetic Minority Oversampling Technique(SMOTE)is applied to get balanced data.Secondly,SMOTE is fed to the nature-inspired Meta Heuristic(MH)algorithm,namely Binary Harris Hawks Optimization(BinHHO),Binary Aquila Optimization(BAO),and Binary Grey Wolf Optimization(BGWO),for feature selection.BinHHO has performed well when compared with the other two.Thirdly,features from BinHHO are fed to the supervised learning algorithms to classify the transactions such as fraud and non-fraud.The efficiency of BinHHO is analyzed with other popular MH algorithms.The BinHHO has achieved the highest accuracy of 99.95%and demonstrates amore significant positive effect on the performance of the proposed model.展开更多
Finger vein recognition is a biometric technique which identifies individuals using their unique finger vein patterns. It is reported to have a high accuracy and rapid processing speed. In addition, it is impossible t...Finger vein recognition is a biometric technique which identifies individuals using their unique finger vein patterns. It is reported to have a high accuracy and rapid processing speed. In addition, it is impossible to steal a vein pattern located inside the finger. We propose a new identification method of finger vascular patterns using a weighted local binary pattern (LBP) and support vector machine (SVM). This research is novel in the following three ways. First, holistic codes are extracted through the LBP method without using a vein detection procedure. This reduces the processing time and the complexities in detecting finger vein patterns. Second, we classify the local areas from which the LBP codes are extracted into three categories based on the SVM classifier: local areas that include a large amount (LA), a medium amount (MA), and a small amount (SA) of vein patterns. Third, different weights are assigned to the extracted LBP code according to the local area type (LA, MA, and SA) from which the LBP codes were extracted. The optimal weights are determined empirically in terms of the accuracy of the finger vein recognition. Experimental results show that our equal error rate (EER) is significantly lower compared to that without the proposed method or using a conventional method.展开更多
基金supported by the National Natural Science Foundation of China (60604021 60874054)
文摘To solve the multi-class fault diagnosis tasks, decision tree support vector machine (DTSVM), which combines SVM and decision tree using the concept of dichotomy, is proposed. Since the classification performance of DTSVM highly depends on its structure, to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes, genetic algorithm is introduced into the formation of decision tree, so that the most separable classes would be separated at each node of decisions tree. Numerical simulations conducted on three datasets compared with "one-against-all" and "one-against-one" demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.
文摘Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.
文摘This paper presents the fault diagnosis of face milling tool based on machine learning approach.While machining,spindle vibration signals in feed direction under healthy and faulty conditions of the milling tool are acquired.A set of discrete wavelet features is extracted from the vibration signals using discrete wavelet transform(DWT)technique.The decision tree technique is used to select significant features out of all extracted wavelet features.C-support vector classification(C-SVC)andν-support vector classification(ν-SVC)models with different kernel functions of support vector machine(SVM)are used to study and classify the tool condition based on selected features.From the results obtained,C-SVC is the best model thanν-SVC and it can be able to give 94.5%classification accuracy for face milling of special steel alloy 42CrMo4.
文摘Renewable energy has garnered attention due to the need for sustainable energy sources.Wind power has emerged as an alternative that has contributed to the transition towards cleaner energy.As the importance of wind energy grows,it can be crucial to provide forecasts that optimize its performance potential.Artificial intelligence(AI)methods have risen in prominence due to how well they can handle complicated systems while enhancing the accuracy of prediction.This study explored the area of AI to predict wind-energy production at a wind farm in Yalova,Turkey,using four different AI approaches:support vector machines(SVMs),decision trees,adaptive neuro-fuzzy inference systems(ANFIS)and artificial neural networks(ANNs).Wind speed and direction were considered as essential input parameters,with wind energy as the target parameter,and models are thoroughly evaluated using metrics such as the mean absolute percentage error(MAPE),coefficient of determination(R~2),and mean absolute error(MAE).The findings accentuate the superior performance of the SVM,which delivered the lowest MAPE(2.42%),the highest R~2(0.95),and the lowest MAE(71.21%)compared with actual values,while ANFIS was less effective in this context.The main aim of this comparative analysis was to rank the models to move to the next step in improving the least efficient methods by combining them with optimization algorithms,such as metaheuristic algorithms.
基金Taif University Researchers Supporting Project Number(TURSP-2020/73)Taif University,Taif,Saudi Arabia.
文摘Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learning models to predict heart failure.The fundamental concept is to compare the correctness of various Machine Learning(ML)algorithms and boost algorithms to improve models’accuracy for prediction.Some supervised algorithms like K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF),Logistic Regression(LR)are considered to achieve the best results.Some boosting algorithms like Extreme Gradient Boosting(XGBoost)and Cat-Boost are also used to improve the prediction using Artificial Neural Networks(ANN).This research also focuses on data visualization to identify patterns,trends,and outliers in a massive data set.Python and Scikit-learns are used for ML.Tensor Flow and Keras,along with Python,are used for ANN model train-ing.The DT and RF algorithms achieved the highest accuracy of 95%among the classifiers.Meanwhile,KNN obtained a second height accuracy of 93.33%.XGBoost had a gratified accuracy of 91.67%,SVM,CATBoost,and ANN had an accuracy of 90%,and LR had 88.33%accuracy.
文摘The ongoing effort to create methods for detecting and quantifying fatigue damage is motivated by the high levels of uncertainty in present fatigue-life prediction approaches and the frequently catastrophic nature of fatigue failure.The fatigue life of high strength aluminum alloy 2090-T83 is predicted in this study using a variety of artificial intelligence and machine learning techniques for constant amplitude and negative stress ratios(R?1).Artificial neural networks(ANN),adaptive neuro-fuzzy inference systems(ANFIS),support-vector machines(SVM),a random forest model(RF),and an extreme-gradient tree-boosting model(XGB)are trained using numerical and experimental input data obtained from fatigue tests based on a relatively low number of stress measurements.In particular,the coefficients of the traditional force law formula are found using relevant numerical methods.It is shown that,in comparison to traditional approaches,the neural network and neuro-fuzzy models produce better results,with the neural network models trained using the boosting iterations technique providing the best performances.Building strong models from weak models,XGB helps to predict fatigue life by reducing model partiality and variation in supervised learning.Fuzzy neural models can be used to predict the fatigue life of alloys more accurately than neural networks and traditional methods.
文摘Landslides are abundant in mountainous regions.They are responsible for substantial damages and losses in those areas.The A1 Highway,which is an important road in Algeria,was sometimes constructed in mountainous and/or semi-mountainous areas.Previous studies of landslide susceptibility mapping conducted near this road using statistical and expert methods have yielded ordinary results.In this research,we are interested in how do machine learning techniques help in increasing accuracy of landslide susceptibility maps in the vicinity of the A1 Highway corridor.To do this,an important section at Ain Bouziane(NE,Algeria) is chosen as a case study to evaluate the landslide susceptibility using three different machine learning methods,namely,random forest(RF),support vector machine(SVM),and boosted regression tree(BRT).First,an inventory map and nine input factors were prepared for landslide susceptibility mapping(LSM) analyses.The three models were constructed to find the most susceptible areas to this phenomenon.The results were assessed by calculating the receiver operating characteristic(ROC) curve,the standard error(Std.error),and the confidence interval(CI) at 95%.The RF model reached the highest predictive accuracy(AUC=97.2%) comparatively to the other models.The outcomes of this research proved that the obtained machine learning models had the ability to predict future landslide locations in this important road section.In addition,their application gives an improvement of the accuracy of LSMs near the road corridor.The machine learning models may become an important prediction tool that will identify landslide alleviation actions.
基金Project (Nos. 60874104 and 70971020) supported by the National Natural Science Foundation of China
文摘Posterior probability support vector machines (PPSVMs) prove robust against noises and outliers and need fewer storage support vectors (SVs). Gonen et al. (2008) extended PPSVMs to a multiclass case by both single-machine and multimachine approaches. However, these extensions suffer from low classification efficiency, high computational burden, and more importantly, unclassifiable regions. To achieve higher classification efficiency and accuracy with fewer SVs, a binary tree of PPSVMs for the multiclass classification problem is proposed in this letter. Moreover, a Fisher ratio separability measure is adopted to determine the tree structure. Several experiments on handwritten recognition datasets are included to illustrate the proposed approach. Specifically, the Fisher ratio separability accelerated binary tree of PPSVMs obtains overall test accuracy, if not higher than, at least comparable to those of other multiclass algorithms, while using significantly fewer SVs and much less test time.
文摘This article aims to assess health habits,safety behaviors,and anxiety factors in the community during the novel coronavirus disease(COVID-19)pandemic in Saudi Arabia based on primary data collected through a questionnaire with 320 respondents.In other words,this paper aims to provide empirical insights into the correlation and the correspondence between sociodemographic factors(gender,nationality,age,citizenship factors,income,and education),and psycho-behavioral effects on individuals in response to the emergence of this new pandemic.To focus on the interaction between these variables and their effects,we suggest different methods of analysis,comprising regression trees and support vector machine regression(SVMR)algorithms.According to the regression tree results,the age variable plays a predominant role in health habits,safety behaviors,and anxiety.The health habit index,which focuses on the extent of behavioral change toward the commitment to use the health and protection methods,is highly affected by gender and age factors.The average monthly income is also a relevant factor but has contrasting effects during the COVID-19 pandemic period.The results of the SVMR model reveal a strong positive effect of income,with R^(2) values of 99.59%,99.93%and 99.88%corresponding to health habits,safety behaviors,and anxiety.
基金financially supported by the Fundamental Research Funds for the Central Non-profit Research Institution of CAF (CAFBB2017ZB004)。
文摘Background: The accurate estimation of soil nutrient content is particularly important in view of its impact on plant growth and forest regeneration. In order to investigate soil nutrient content and quality for the natural regeneration of Dacrydium pectinatum communities in China, designing advanced and accurate estimation methods is necessary.Methods: This study uses machine learning techniques created a series of comprehensive and novel models from which to evaluate soil nutrient content. Soil nutrient evaluation methods were built by using six support vector machines and four artificial neural networks.Results: The generalized regression neural network model was the best artificial neural network evaluation model with the smallest root mean square error(5.1), mean error(-0.85), and mean square prediction error(29). The accuracy rate of the combined k-nearest neighbors(k-NN) local support vector machines model(i.e. k-nearest neighbors-support vector machine(KNNSVM)) for soil nutrient evaluation was high, comparing to the other five partial support vector machines models investigated. The area under curve value of generalized regression neural network(0.6572) was the highest, and the cross-validation result showed that the generalized regression neural network reached 92.5%.Conclusions: Both the KNNSVM and generalized regression neural network models can be effectively used to evaluate soil nutrient content and quality grades in conjunction with appropriate model variables. Developing a new feasible evaluation method to assess soil nutrient quality for Dacrydium pectinatum, results from this study can be used as a reference for the adaptive management of rare and endangered tree species. This study, however, found some uncertainties in data acquisition and model simulations, which will be investigated in upcoming studies.
基金TheNationalHighTechnologyResearchandDevelopmentProgramofChina (No .86 3 5 11 930 0 0 9)
文摘Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
文摘Because of the increasing attention on environmental issues, especially air pollution, predicting whether a day is polluted or not is necessary to people’s health. In order to solve this problem, this research is classifying ground ozone level based on big data and machine learning models, where polluted ozone day has class 1 and non-ozone day has class 0. The dataset used in this research was derived from the UCI Website, containing various environmental factors in Houston, Galveston and Brazoria area that could possibly affect the occurrence of ozone pollution [1]. This dataset is first filled up for further process, next standardized to ensure every feature has the same weight, and then split into training set and testing set. After this, five different machine learning models are used in the prediction of ground ozone level and their final accuracy scores are compared. In conclusion, among Logistic Regression, Decision Tree, Random Forest, AdaBoost, and Support Vector Machine (SVM), the last one has the highest test score of 0.949. This research utilizes relatively simple methods of forecasting and calculates the first accuracy scores in predicting ground ozone level;it can thus be a reference for environmentalists. Moreover, the direct comparison among five different models provides machine learning field an insight to determine the most accurate model. In the future, Neural Network can also be utilized to predict air pollution, and its test scores can be compared with the previous five methods to conclude the accuracy of Neuron Network.
文摘Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorithms,namely Linear Discriminant Analysis(LDA),Support Vector Machines(SVMs),k-nearest neighbor(kNN),Naïve Bayes(NB),Recursive Partitioning and Regression Trees(RPART),and Random Forest(RF),were selected to forecast the timings of barley flowering and maturity based on the Alaska Crop Datasets and climate data from 1991 to 2016 in Fairbanks,Alaska.Among 32 models fit to forecast flowering time,two from LDA,12 from SVMs,four from NB,three from RF outperformed models from other algorithms with the highest accuracy.Models from kNN performed worst to forecast flowering time.Among 32 models fit to forecast maturity time,two models from LDA outperformed the models from other algorithms.Models from kNN and RPART performed worst to forecast maturity time.Models from machine learning methods also provided a variable importance explanation.In this study,four out of six algorithms gave the same variable importance order.Sowing date was the most important variable to forecast flowering but less important variable to forecast maturity.The daily maximum temperature may be more important than daily minimum temperature to fit flowering models while daily minimum temperature may be more important than daily maximum temperature to fit maturity models.The results indicate that models from machine learning provide a promising technique in forecasting the timings of flowering and maturity of barley.
文摘Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.
文摘In the last decade, a few valuable types of research have been conducted to discriminate fractured zones from non-fractured ones. In this paper, petrophysical and image logs of eight wells were utilized to detect fractured zones. Decision tree, random forest, support vector machine, and deep learning were four classifiers applied over petrophysical logs and image logs for both training and testing. The output of classifiers was fused by ordered weighted averaging data fusion to achieve more reliable, accurate, and general results. Accuracy of close to 99% has been achieved. This study reports a significant improvement compared to the existing work that has an accuracy of close to 80%.
文摘Fraud Transactions are haunting the economy of many individuals with several factors across the globe.This research focuses on developing a mechanism by integrating various optimized machine-learning algorithms to ensure the security and integrity of digital transactions.This research proposes a novel methodology through three stages.Firstly,Synthetic Minority Oversampling Technique(SMOTE)is applied to get balanced data.Secondly,SMOTE is fed to the nature-inspired Meta Heuristic(MH)algorithm,namely Binary Harris Hawks Optimization(BinHHO),Binary Aquila Optimization(BAO),and Binary Grey Wolf Optimization(BGWO),for feature selection.BinHHO has performed well when compared with the other two.Thirdly,features from BinHHO are fed to the supervised learning algorithms to classify the transactions such as fraud and non-fraud.The efficiency of BinHHO is analyzed with other popular MH algorithms.The BinHHO has achieved the highest accuracy of 99.95%and demonstrates amore significant positive effect on the performance of the proposed model.
基金Project(No.R112002105070020(2010))supported by the National Research Foundation of Korea(NRF) through the Biometrics Engi-neering Research Center(BERC)at Yonsei University
文摘Finger vein recognition is a biometric technique which identifies individuals using their unique finger vein patterns. It is reported to have a high accuracy and rapid processing speed. In addition, it is impossible to steal a vein pattern located inside the finger. We propose a new identification method of finger vascular patterns using a weighted local binary pattern (LBP) and support vector machine (SVM). This research is novel in the following three ways. First, holistic codes are extracted through the LBP method without using a vein detection procedure. This reduces the processing time and the complexities in detecting finger vein patterns. Second, we classify the local areas from which the LBP codes are extracted into three categories based on the SVM classifier: local areas that include a large amount (LA), a medium amount (MA), and a small amount (SA) of vein patterns. Third, different weights are assigned to the extracted LBP code according to the local area type (LA, MA, and SA) from which the LBP codes were extracted. The optimal weights are determined empirically in terms of the accuracy of the finger vein recognition. Experimental results show that our equal error rate (EER) is significantly lower compared to that without the proposed method or using a conventional method.