A fault diagnosis model is proposed based on fuzzy support vector machine (FSVM) combined with fuzzy clustering (FC).Considering the relationship between the sample point and non-self class,FC algorithm is applied to ...A fault diagnosis model is proposed based on fuzzy support vector machine (FSVM) combined with fuzzy clustering (FC).Considering the relationship between the sample point and non-self class,FC algorithm is applied to generate fuzzy memberships.In the algorithm,sample weights based on a distribution density function of data point and genetic algorithm (GA) are introduced to enhance the performance of FC.Then a multi-class FSVM with radial basis function kernel is established according to directed acyclic graph algorithm,the penalty factor and kernel parameter of which are optimized by GA.Finally,the model is executed for multi-class fault diagnosis of rolling element bearings.The results show that the presented model achieves high performances both in identifying fault types and fault degrees.The performance comparisons of the presented model with SVM and distance-based FSVM for noisy case demonstrate the capacity of dealing with noise and generalization.展开更多
Brain tumors come in various types,each with distinct characteristics and treatment approaches,making manual detection a time-consuming and potentially ambiguous process.Brain tumor detection is a valuable tool for ga...Brain tumors come in various types,each with distinct characteristics and treatment approaches,making manual detection a time-consuming and potentially ambiguous process.Brain tumor detection is a valuable tool for gaining a deeper understanding of tumors and improving treatment outcomes.Machine learning models have become key players in automating brain tumor detection.Gradient descent methods are the mainstream algorithms for solving machine learning models.In this paper,we propose a novel distributed proximal stochastic gradient descent approach to solve the L_(1)-Smooth Support Vector Machine(SVM)classifier for brain tumor detection.Firstly,the smooth hinge loss is introduced to be used as the loss function of SVM.It avoids the issue of nondifferentiability at the zero point encountered by the traditional hinge loss function during gradient descent optimization.Secondly,the L_(1) regularization method is employed to sparsify features and enhance the robustness of the model.Finally,adaptive proximal stochastic gradient descent(PGD)with momentum,and distributed adaptive PGDwithmomentum(DPGD)are proposed and applied to the L_(1)-Smooth SVM.Distributed computing is crucial in large-scale data analysis,with its value manifested in extending algorithms to distributed clusters,thus enabling more efficient processing ofmassive amounts of data.The DPGD algorithm leverages Spark,enabling full utilization of the computer’s multi-core resources.Due to its sparsity induced by L_(1) regularization on parameters,it exhibits significantly accelerated convergence speed.From the perspective of loss reduction,DPGD converges faster than PGD.The experimental results show that adaptive PGD withmomentumand its variants have achieved cutting-edge accuracy and efficiency in brain tumor detection.Frompre-trained models,both the PGD andDPGD outperform other models,boasting an accuracy of 95.21%.展开更多
Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urb...Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanizedcountries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples isbecoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions.The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the proposed work, theiterative weighted map reduce framework is introduced for the effective management of large dataset samples. A binary classification problem is formulated usingensemble nonlinear support vector machines and random forests. Thus, instead ofusing the normal linear combination of kernel activations, the proposed work creates nonlinear combinations of kernel activations in prototype examples. Furthermore, different descriptors are combined in an ensemble of deep support vectormachines, where the product rule is used to combine probability estimates ofdifferent classifiers. Performance is evaluated in terms of the prediction accuracyand interpretability of the model and the results.展开更多
Finger vein recognition is a biometric technique which identifies individuals using their unique finger vein patterns. It is reported to have a high accuracy and rapid processing speed. In addition, it is impossible t...Finger vein recognition is a biometric technique which identifies individuals using their unique finger vein patterns. It is reported to have a high accuracy and rapid processing speed. In addition, it is impossible to steal a vein pattern located inside the finger. We propose a new identification method of finger vascular patterns using a weighted local binary pattern (LBP) and support vector machine (SVM). This research is novel in the following three ways. First, holistic codes are extracted through the LBP method without using a vein detection procedure. This reduces the processing time and the complexities in detecting finger vein patterns. Second, we classify the local areas from which the LBP codes are extracted into three categories based on the SVM classifier: local areas that include a large amount (LA), a medium amount (MA), and a small amount (SA) of vein patterns. Third, different weights are assigned to the extracted LBP code according to the local area type (LA, MA, and SA) from which the LBP codes were extracted. The optimal weights are determined empirically in terms of the accuracy of the finger vein recognition. Experimental results show that our equal error rate (EER) is significantly lower compared to that without the proposed method or using a conventional method.展开更多
Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications ...Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics.In this paper,a novel feature-weighted support vector machine(SVM) credit scoring model is presented for credit risk assessment,in which an F-score is adopted for feature importance ranking.Considering the mutual interaction among modeling features,random forest is further introduced for relative feature importance measurement.These two feature-weighted versions of SVM are tested against the traditional SVM on two real-world datasets and the research results reveal the validity of the proposed method.展开更多
Support vector machine(SVM)is a widely used method for classification.Proximal support vector machine(PSVM)is an extension of SVM and a promisingmethod to lead to a fast and simple algorithm for generating a classifie...Support vector machine(SVM)is a widely used method for classification.Proximal support vector machine(PSVM)is an extension of SVM and a promisingmethod to lead to a fast and simple algorithm for generating a classifier.Motivated by the fast computational efforts of PSVM and the properties of sparse solution yielded by l1-norm,in this paper,we first propose a PSVM with a cardinality constraint which is eventually relaxed byl1-norm and leads to a trade-offl1−l2 regularized sparse PSVM.Next we convert thisl1−l2 regularized sparse PSVM into an equivalent form of1 regularized least squares(LS)and solve it by a specialized interior-point method proposed by Kim et al.(J SelTop Signal Process 12:1932–4553,2007).Finally,l1−l2 regularized sparse PSVM is illustrated by means of a real-world dataset taken from the University of California,Irvine Machine Learning Repository(UCI Repository).Moreover,we compare the numerical results with the existing models such as generalized eigenvalue proximal SVM(GEPSVM),PSVM,and SVM-Light.The numerical results showthat thel1−l2 regularized sparsePSVMachieves not only better accuracy rate of classification than those of GEPSVM,PSVM,and SVM-Light,but also a sparser classifier compared with the1-PSVM.展开更多
Classification problem is the central problem in machine learning.Support vector machines(SVMs)are supervised learning models with associated learning algorithms and are used for classification in machine learning.In ...Classification problem is the central problem in machine learning.Support vector machines(SVMs)are supervised learning models with associated learning algorithms and are used for classification in machine learning.In this paper,we establish two consensus proximal support vector machines(PSVMs)models,based on methods for binary classification.The first one is to separate the objective functions into individual convex functions by using the number of the sample points of the training set.The constraints contain two types of the equations with global variables and local variables corresponding to the consensus points and sample points,respectively.To get more sparse solutions,the second one is l1–l2 consensus PSVMs in which the objective function contains an■1-norm term and an■2-norm term which is responsible for the good classification performance while■1-norm term plays an important role in finding the sparse solutions.Two consensus PSVMs are solved by the alternating direction method of multipliers.Furthermore,they are implemented by the real-world data taken from the University of California,Irvine Machine Learning Repository(UCI Repository)and are compared with the existed models such as■1-PSVM,■p-PSVM,GEPSVM,PSVM,and SVM-light.Numerical results show that our models outperform others with the classification accuracy and the sparse solutions.展开更多
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for ...Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.展开更多
In the last decade, a few valuable types of research have been conducted to discriminate fractured zones from non-fractured ones. In this paper, petrophysical and image logs of eight wells were utilized to detect frac...In the last decade, a few valuable types of research have been conducted to discriminate fractured zones from non-fractured ones. In this paper, petrophysical and image logs of eight wells were utilized to detect fractured zones. Decision tree, random forest, support vector machine, and deep learning were four classifiers applied over petrophysical logs and image logs for both training and testing. The output of classifiers was fused by ordered weighted averaging data fusion to achieve more reliable, accurate, and general results. Accuracy of close to 99% has been achieved. This study reports a significant improvement compared to the existing work that has an accuracy of close to 80%.展开更多
Support vector machines(SVMs)are a kind of important machine learning methods generated by the cross interaction of statistical theory and optimization,and have been extensively applied into text categorization,diseas...Support vector machines(SVMs)are a kind of important machine learning methods generated by the cross interaction of statistical theory and optimization,and have been extensively applied into text categorization,disease diagnosis,face detection and so on.The loss function is the core research content of SVM,and its variational properties play an important role in the analysis of optimality conditions,the design of optimization algorithms,the representation of support vectors and the research of dual problems.This paper summarizes and analyzes the 0-1 loss function and its eighteen popular surrogate loss functions in SVM,and gives three variational properties of these loss functions:subdifferential,proximal operator and Fenchel conjugate,where the nine proximal operators and fifteen Fenchel conjugates are given by this paper.展开更多
最小二乘支持向量机(Least Squares Support Vector Machine,LSSVM)通过求解一个线性等式方程组来提高支持向量机(Support Vector Machine,SVM)的运算速度。但是,LSSVM没有考虑间隔分布对于LSSVM模型的影响,导致其精度较低。为了增强LS...最小二乘支持向量机(Least Squares Support Vector Machine,LSSVM)通过求解一个线性等式方程组来提高支持向量机(Support Vector Machine,SVM)的运算速度。但是,LSSVM没有考虑间隔分布对于LSSVM模型的影响,导致其精度较低。为了增强LSSVM模型的泛化性能,提高其分类能力,提出一种具有间隔分布优化的最小二乘支持向量机(LSSVM with margin distribution optimization,MLSSVM)。首先,重新定义间隔均值和间隔方差,深入挖掘数据的间隔分布信息,增强模型的泛化性能;其次,引入权重线性损失,进一步优化了间隔均值,提升模型的分类精度;然后,分析目标函数,剔除冗余项,进一步优化间隔方差;最后,保留LSSVM的求解机制,保障模型的计算效率。实验表明,新提出的分类模型具有良好的泛化性能和运行时间。展开更多
基金Supported by the joint fund of National Natural Science Foundation of China and Civil Aviation Administration Foundation of China(No.U1233201)
文摘A fault diagnosis model is proposed based on fuzzy support vector machine (FSVM) combined with fuzzy clustering (FC).Considering the relationship between the sample point and non-self class,FC algorithm is applied to generate fuzzy memberships.In the algorithm,sample weights based on a distribution density function of data point and genetic algorithm (GA) are introduced to enhance the performance of FC.Then a multi-class FSVM with radial basis function kernel is established according to directed acyclic graph algorithm,the penalty factor and kernel parameter of which are optimized by GA.Finally,the model is executed for multi-class fault diagnosis of rolling element bearings.The results show that the presented model achieves high performances both in identifying fault types and fault degrees.The performance comparisons of the presented model with SVM and distance-based FSVM for noisy case demonstrate the capacity of dealing with noise and generalization.
基金the Natural Science Foundation of Ningxia Province(No.2021AAC03230).
文摘Brain tumors come in various types,each with distinct characteristics and treatment approaches,making manual detection a time-consuming and potentially ambiguous process.Brain tumor detection is a valuable tool for gaining a deeper understanding of tumors and improving treatment outcomes.Machine learning models have become key players in automating brain tumor detection.Gradient descent methods are the mainstream algorithms for solving machine learning models.In this paper,we propose a novel distributed proximal stochastic gradient descent approach to solve the L_(1)-Smooth Support Vector Machine(SVM)classifier for brain tumor detection.Firstly,the smooth hinge loss is introduced to be used as the loss function of SVM.It avoids the issue of nondifferentiability at the zero point encountered by the traditional hinge loss function during gradient descent optimization.Secondly,the L_(1) regularization method is employed to sparsify features and enhance the robustness of the model.Finally,adaptive proximal stochastic gradient descent(PGD)with momentum,and distributed adaptive PGDwithmomentum(DPGD)are proposed and applied to the L_(1)-Smooth SVM.Distributed computing is crucial in large-scale data analysis,with its value manifested in extending algorithms to distributed clusters,thus enabling more efficient processing ofmassive amounts of data.The DPGD algorithm leverages Spark,enabling full utilization of the computer’s multi-core resources.Due to its sparsity induced by L_(1) regularization on parameters,it exhibits significantly accelerated convergence speed.From the perspective of loss reduction,DPGD converges faster than PGD.The experimental results show that adaptive PGD withmomentumand its variants have achieved cutting-edge accuracy and efficiency in brain tumor detection.Frompre-trained models,both the PGD andDPGD outperform other models,boasting an accuracy of 95.21%.
文摘Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanizedcountries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples isbecoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions.The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the proposed work, theiterative weighted map reduce framework is introduced for the effective management of large dataset samples. A binary classification problem is formulated usingensemble nonlinear support vector machines and random forests. Thus, instead ofusing the normal linear combination of kernel activations, the proposed work creates nonlinear combinations of kernel activations in prototype examples. Furthermore, different descriptors are combined in an ensemble of deep support vectormachines, where the product rule is used to combine probability estimates ofdifferent classifiers. Performance is evaluated in terms of the prediction accuracyand interpretability of the model and the results.
基金Project(No.R112002105070020(2010))supported by the National Research Foundation of Korea(NRF) through the Biometrics Engi-neering Research Center(BERC)at Yonsei University
文摘Finger vein recognition is a biometric technique which identifies individuals using their unique finger vein patterns. It is reported to have a high accuracy and rapid processing speed. In addition, it is impossible to steal a vein pattern located inside the finger. We propose a new identification method of finger vascular patterns using a weighted local binary pattern (LBP) and support vector machine (SVM). This research is novel in the following three ways. First, holistic codes are extracted through the LBP method without using a vein detection procedure. This reduces the processing time and the complexities in detecting finger vein patterns. Second, we classify the local areas from which the LBP codes are extracted into three categories based on the SVM classifier: local areas that include a large amount (LA), a medium amount (MA), and a small amount (SA) of vein patterns. Third, different weights are assigned to the extracted LBP code according to the local area type (LA, MA, and SA) from which the LBP codes were extracted. The optimal weights are determined empirically in terms of the accuracy of the finger vein recognition. Experimental results show that our equal error rate (EER) is significantly lower compared to that without the proposed method or using a conventional method.
基金Project supported by the National Basic Research Program (973) of China (No. 2011CB706506)the National Natural Science Foundation of China (No. 50905159)+1 种基金the Natural Science Foundation of Jiangsu Province (No. BK2010261)the Fundamental Research Funds for the Central Universities (No. 2011XZZX005),China
文摘Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics.In this paper,a novel feature-weighted support vector machine(SVM) credit scoring model is presented for credit risk assessment,in which an F-score is adopted for feature importance ranking.Considering the mutual interaction among modeling features,random forest is further introduced for relative feature importance measurement.These two feature-weighted versions of SVM are tested against the traditional SVM on two real-world datasets and the research results reveal the validity of the proposed method.
基金This research was supported by the National Natural Science Foundation of China(No.11371242).
文摘Support vector machine(SVM)is a widely used method for classification.Proximal support vector machine(PSVM)is an extension of SVM and a promisingmethod to lead to a fast and simple algorithm for generating a classifier.Motivated by the fast computational efforts of PSVM and the properties of sparse solution yielded by l1-norm,in this paper,we first propose a PSVM with a cardinality constraint which is eventually relaxed byl1-norm and leads to a trade-offl1−l2 regularized sparse PSVM.Next we convert thisl1−l2 regularized sparse PSVM into an equivalent form of1 regularized least squares(LS)and solve it by a specialized interior-point method proposed by Kim et al.(J SelTop Signal Process 12:1932–4553,2007).Finally,l1−l2 regularized sparse PSVM is illustrated by means of a real-world dataset taken from the University of California,Irvine Machine Learning Repository(UCI Repository).Moreover,we compare the numerical results with the existing models such as generalized eigenvalue proximal SVM(GEPSVM),PSVM,and SVM-Light.The numerical results showthat thel1−l2 regularized sparsePSVMachieves not only better accuracy rate of classification than those of GEPSVM,PSVM,and SVM-Light,but also a sparser classifier compared with the1-PSVM.
基金This work is supported by the National Natural Science Foundation of China(Grant No.11371242)and the“085 Project”in Shanghai University.
文摘Classification problem is the central problem in machine learning.Support vector machines(SVMs)are supervised learning models with associated learning algorithms and are used for classification in machine learning.In this paper,we establish two consensus proximal support vector machines(PSVMs)models,based on methods for binary classification.The first one is to separate the objective functions into individual convex functions by using the number of the sample points of the training set.The constraints contain two types of the equations with global variables and local variables corresponding to the consensus points and sample points,respectively.To get more sparse solutions,the second one is l1–l2 consensus PSVMs in which the objective function contains an■1-norm term and an■2-norm term which is responsible for the good classification performance while■1-norm term plays an important role in finding the sparse solutions.Two consensus PSVMs are solved by the alternating direction method of multipliers.Furthermore,they are implemented by the real-world data taken from the University of California,Irvine Machine Learning Repository(UCI Repository)and are compared with the existed models such as■1-PSVM,■p-PSVM,GEPSVM,PSVM,and SVM-light.Numerical results show that our models outperform others with the classification accuracy and the sparse solutions.
基金TheNationalHighTechnologyResearchandDevelopmentProgramofChina (No .86 3 5 11 930 0 0 9)
文摘Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
文摘In the last decade, a few valuable types of research have been conducted to discriminate fractured zones from non-fractured ones. In this paper, petrophysical and image logs of eight wells were utilized to detect fractured zones. Decision tree, random forest, support vector machine, and deep learning were four classifiers applied over petrophysical logs and image logs for both training and testing. The output of classifiers was fused by ordered weighted averaging data fusion to achieve more reliable, accurate, and general results. Accuracy of close to 99% has been achieved. This study reports a significant improvement compared to the existing work that has an accuracy of close to 80%.
文摘Support vector machines(SVMs)are a kind of important machine learning methods generated by the cross interaction of statistical theory and optimization,and have been extensively applied into text categorization,disease diagnosis,face detection and so on.The loss function is the core research content of SVM,and its variational properties play an important role in the analysis of optimality conditions,the design of optimization algorithms,the representation of support vectors and the research of dual problems.This paper summarizes and analyzes the 0-1 loss function and its eighteen popular surrogate loss functions in SVM,and gives three variational properties of these loss functions:subdifferential,proximal operator and Fenchel conjugate,where the nine proximal operators and fifteen Fenchel conjugates are given by this paper.
文摘最小二乘支持向量机(Least Squares Support Vector Machine,LSSVM)通过求解一个线性等式方程组来提高支持向量机(Support Vector Machine,SVM)的运算速度。但是,LSSVM没有考虑间隔分布对于LSSVM模型的影响,导致其精度较低。为了增强LSSVM模型的泛化性能,提高其分类能力,提出一种具有间隔分布优化的最小二乘支持向量机(LSSVM with margin distribution optimization,MLSSVM)。首先,重新定义间隔均值和间隔方差,深入挖掘数据的间隔分布信息,增强模型的泛化性能;其次,引入权重线性损失,进一步优化了间隔均值,提升模型的分类精度;然后,分析目标函数,剔除冗余项,进一步优化间隔方差;最后,保留LSSVM的求解机制,保障模型的计算效率。实验表明,新提出的分类模型具有良好的泛化性能和运行时间。