Big data is a term that refers to a set of data that,due to its largeness or complexity,cannot be stored or processed with one of the usual tools or applications for data management,and it has become a prominent word...Big data is a term that refers to a set of data that,due to its largeness or complexity,cannot be stored or processed with one of the usual tools or applications for data management,and it has become a prominent word in recent years for the massive development of technology.Almost immediately thereafter,the term“big data mining”emerged,i.e.,mining from big data even as an emerging and interconnected field of research.Classification is an important stage in data mining since it helps people make better decisions in a variety of situations,including scientific endeavors,biomedical research,and industrial applications.The probabilistic neural network(PNN)is a commonly used and successful method for handling classification and pattern recognition issues.In this study,the authors proposed to combine the probabilistic neural network(PPN),which is one of the data mining techniques,with the vibrating particles system(VPS),which is one of the metaheuristic algorithms named“VPS-PNN”,to solve classi-fication problems more effectively.The data set is eleven common benchmark medical datasets from the machine-learning library,the suggested method was tested.The suggested VPS-PNN mechanism outperforms the PNN,biogeography-based optimization,enhanced-water cycle algorithm(E-WCA)and the firefly algorithm(FA)in terms of convergence speed and classification accuracy.展开更多
This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to ...This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to literature, the ABC algorithm is traditionally applied to constrained and unconstrained problems, but is combined with modified DNA concepts and implemented for fuzzy classification in this present research. Moreover, from the best of our knowledge, previous research on the ABC algorithm has not combined it with DNA computing for hierarchical fuzzy classification to explore the merits of cooperative coevolution. Therefore, this paper is the first to apply the mechanism of symbiosis to create a hybrid modified DNA-ABC algorithm for hierarchical fuzzy classification applications. In this study, the partition number and the shape of the membership function are extracted by the symbiosis based hybrid modified DNA-ABC optimization algorithm, which provides both sufficient global exploration and also adequate local exploitation for hierarchical fuzzy classification. The proposed optimization algorithm is applied on five benchmark University of Irvine (UCI) data sets, and the results prove the efficiency of the algorithm.展开更多
In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data...In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.展开更多
This paper applies a machine learning technique to find a general and efficient numerical integration scheme for boundary element methods.A model based on the neural network multi-classification algorithmis constructe...This paper applies a machine learning technique to find a general and efficient numerical integration scheme for boundary element methods.A model based on the neural network multi-classification algorithmis constructed to find the minimum number of Gaussian quadrature points satisfying the given accuracy.The constructed model is trained by using a large amount of data calculated in the traditional boundary element method and the optimal network architecture is selected.The two-dimensional potential problem of a circular structure is tested and analyzed based on the determined model,and the accuracy of the model is about 90%.Finally,by incorporating the predicted Gaussian quadrature points into the boundary element analysis,we find that the numerical solution and the analytical solution are in good agreement,which verifies the robustness of the proposed method.展开更多
Classification problem is the central problem in machine learning.Support vector machines(SVMs)are supervised learning models with associated learning algorithms and are used for classification in machine learning.In ...Classification problem is the central problem in machine learning.Support vector machines(SVMs)are supervised learning models with associated learning algorithms and are used for classification in machine learning.In this paper,we establish two consensus proximal support vector machines(PSVMs)models,based on methods for binary classification.The first one is to separate the objective functions into individual convex functions by using the number of the sample points of the training set.The constraints contain two types of the equations with global variables and local variables corresponding to the consensus points and sample points,respectively.To get more sparse solutions,the second one is l1–l2 consensus PSVMs in which the objective function contains an■1-norm term and an■2-norm term which is responsible for the good classification performance while■1-norm term plays an important role in finding the sparse solutions.Two consensus PSVMs are solved by the alternating direction method of multipliers.Furthermore,they are implemented by the real-world data taken from the University of California,Irvine Machine Learning Repository(UCI Repository)and are compared with the existed models such as■1-PSVM,■p-PSVM,GEPSVM,PSVM,and SVM-light.Numerical results show that our models outperform others with the classification accuracy and the sparse solutions.展开更多
Background:The 11th revision of the International Classification of Diseases and Related Health Problems(ICD-11)was released on June 18,2018,by the World Health Organization and will come into effect on January 1,2022...Background:The 11th revision of the International Classification of Diseases and Related Health Problems(ICD-11)was released on June 18,2018,by the World Health Organization and will come into effect on January 1,2022.Apart from the chapters on the classification of diseases in the conventional medicine(CM),a new chapter,traditional medicine(TM)conditions–Module 1,was added.Low back pain(LBP)is one of the common reasons for the physician visits.The classification codes for LBP in the ICD-11 are vital to documenting accurate clinical diagnoses.Methods:The qualitative case study method was adopted.The secondary use data for 100 patients were randomly selected using the ICD-11 online interface to find the classification codes for both the CM section and the TM Conditions–Module 1(TM1)section for LBP diagnosis.Results:Of the 27 codes obtained from the CM section,six codes were not relevant to LBP,whereas the other 21 codes represented diagnoses of LBP and its related diseases or syndromes.In the TM1 section,six codes for different patterns and disorders represented the diagnoses for LBP from the TM perspective.Conclusion:This study indicates that specific diagnoses of LBP can be represented by the combination of CM classification codes and TM1 classification codes in the ICD-11;the CM codes represent specific and accurate clinical diagnoses for LBP,whereas the TM1 codes add more accuracy to the diagnoses of different patterns from the TM perspective.展开更多
In machines learning problems, Support Vector Machine is a method of classification. For non-linearly separable data, kernel functions are a basic ingredient in the SVM technic. In this paper, we briefly recall some u...In machines learning problems, Support Vector Machine is a method of classification. For non-linearly separable data, kernel functions are a basic ingredient in the SVM technic. In this paper, we briefly recall some useful results on decomposition of RKHS. Based on orthogonal polynomial theory and Mercer theorem, we construct the high power Legendre polynomial kernel on the cube [-1,1]<sup>d</sup>. Following presentation of the theoretical background of SVM, we evaluate the performance of this kernel on some illustrative examples in comparison with Rbf, linear and polynomial kernels.展开更多
The existence of bounded and unbounded solutions to nonlinear reactinn-diffusion problemut = △Φ(u) + F(u,x,t) with initial or initial-boundal conditinns is discussed when u=u(x, t), x ∈ R. Simple criteria are given.
An edge-coloring of a graph G is an coloring of a graph G is an edge-coloring of G such assignment of colors to all the edges of G. A go- that each color appears at each vertex at least g(v) times. The maximum integ...An edge-coloring of a graph G is an coloring of a graph G is an edge-coloring of G such assignment of colors to all the edges of G. A go- that each color appears at each vertex at least g(v) times. The maximum integer k such that G has a go-coloring with k colors is called the gc-chromatic index of G and denoted by X'gc (G). In this paper, we extend a result on edge-covering coloring of Zhang X'gc( ) = δg(G), and Liu in 2011, and give a new sufficient condition for a simple graph G to satisfy ' x'gc(G)=δg(G),where δg(G)=minv∈V(G){[d(v)/g(v)]}.展开更多
Logistic regression has been proved as a promising method for machine learning,which focuses on the problem of classification.In this paper,we present anl_(1)-l_(2)-regularized logistic regression model,where thel1-no...Logistic regression has been proved as a promising method for machine learning,which focuses on the problem of classification.In this paper,we present anl_(1)-l_(2)-regularized logistic regression model,where thel1-norm is responsible for yielding a sparse logistic regression classifier and thel_(2)-norm for keeping betlter classification accuracy.To solve thel_(1)-l_(2)-regularized logistic regression model,we develop an alternating direction method of multipliers with embedding limitedlBroyden-Fletcher-Goldfarb-Shanno(L-BFGS)method.Furthermore,we implement our model for binary classification problems by using real data examples selected from the University of California,Irvine Machines Learning Repository(UCI Repository).We compare our numerical results with those obtained by the well-known LIBSVM and SVM-Light software.The numerical results show that ourl_(1)-l_(2)-regularized logisltic regression model achieves better classification and less CPU Time.展开更多
The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this ...The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this paper proposes an unbalanced accuracy weighted random forest algorithm(UAW_RF)based on the adaptive step size artificial bee colony optimization.It combines the ideas of decision tree optimization,sampling selection,and weighted voting to improve the ability of stochastic forest algorithm when dealing with biased data classification.The adaptive step size and the optimal solution were introduced to improve the position updating formula of the artificial bee colony algorithm,and then the parameter combination of the random forest algorithm was iteratively optimized with the advantages of the algorithm.Experimental results show satisfactory accuracies and prove that the method can effectively improve the classification accuracy of the random forest algorithm.展开更多
文摘Big data is a term that refers to a set of data that,due to its largeness or complexity,cannot be stored or processed with one of the usual tools or applications for data management,and it has become a prominent word in recent years for the massive development of technology.Almost immediately thereafter,the term“big data mining”emerged,i.e.,mining from big data even as an emerging and interconnected field of research.Classification is an important stage in data mining since it helps people make better decisions in a variety of situations,including scientific endeavors,biomedical research,and industrial applications.The probabilistic neural network(PNN)is a commonly used and successful method for handling classification and pattern recognition issues.In this study,the authors proposed to combine the probabilistic neural network(PPN),which is one of the data mining techniques,with the vibrating particles system(VPS),which is one of the metaheuristic algorithms named“VPS-PNN”,to solve classi-fication problems more effectively.The data set is eleven common benchmark medical datasets from the machine-learning library,the suggested method was tested.The suggested VPS-PNN mechanism outperforms the PNN,biogeography-based optimization,enhanced-water cycle algorithm(E-WCA)and the firefly algorithm(FA)in terms of convergence speed and classification accuracy.
文摘This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to literature, the ABC algorithm is traditionally applied to constrained and unconstrained problems, but is combined with modified DNA concepts and implemented for fuzzy classification in this present research. Moreover, from the best of our knowledge, previous research on the ABC algorithm has not combined it with DNA computing for hierarchical fuzzy classification to explore the merits of cooperative coevolution. Therefore, this paper is the first to apply the mechanism of symbiosis to create a hybrid modified DNA-ABC algorithm for hierarchical fuzzy classification applications. In this study, the partition number and the shape of the membership function are extracted by the symbiosis based hybrid modified DNA-ABC optimization algorithm, which provides both sufficient global exploration and also adequate local exploitation for hierarchical fuzzy classification. The proposed optimization algorithm is applied on five benchmark University of Irvine (UCI) data sets, and the results prove the efficiency of the algorithm.
基金partly supported by the Major Project of the National Social Science Foundation of China under Grant No.18VZL006the National Natural Science Foundation of China under Grant Nos.71571126and 71974139+6 种基金the Excellent Youth Foundation of Sichuan Province under Grant No.20JCQN0225the Tianfu Ten-thousand Talents Program of Sichuan Provincethe Excellent Youth Foundation of Sichuan University under Grant No.sksyl201709the Leading Cultivation Talents Program of Sichuan Universitythe Teacher and Student Joint Innovation Project of Business School of Sichuan University under Grant No.LH2018011the2018 Special Project for Cultivation and Innovation of New AcademicQian Platform Talent under Grant No.5772-012。
文摘In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.
基金The authors thank the financial support of National Natural Science Foundation of China(NSFC)under Grant(No.11702238).
文摘This paper applies a machine learning technique to find a general and efficient numerical integration scheme for boundary element methods.A model based on the neural network multi-classification algorithmis constructed to find the minimum number of Gaussian quadrature points satisfying the given accuracy.The constructed model is trained by using a large amount of data calculated in the traditional boundary element method and the optimal network architecture is selected.The two-dimensional potential problem of a circular structure is tested and analyzed based on the determined model,and the accuracy of the model is about 90%.Finally,by incorporating the predicted Gaussian quadrature points into the boundary element analysis,we find that the numerical solution and the analytical solution are in good agreement,which verifies the robustness of the proposed method.
基金This work is supported by the National Natural Science Foundation of China(Grant No.11371242)and the“085 Project”in Shanghai University.
文摘Classification problem is the central problem in machine learning.Support vector machines(SVMs)are supervised learning models with associated learning algorithms and are used for classification in machine learning.In this paper,we establish two consensus proximal support vector machines(PSVMs)models,based on methods for binary classification.The first one is to separate the objective functions into individual convex functions by using the number of the sample points of the training set.The constraints contain two types of the equations with global variables and local variables corresponding to the consensus points and sample points,respectively.To get more sparse solutions,the second one is l1–l2 consensus PSVMs in which the objective function contains an■1-norm term and an■2-norm term which is responsible for the good classification performance while■1-norm term plays an important role in finding the sparse solutions.Two consensus PSVMs are solved by the alternating direction method of multipliers.Furthermore,they are implemented by the real-world data taken from the University of California,Irvine Machine Learning Repository(UCI Repository)and are compared with the existed models such as■1-PSVM,■p-PSVM,GEPSVM,PSVM,and SVM-light.Numerical results show that our models outperform others with the classification accuracy and the sparse solutions.
文摘Background:The 11th revision of the International Classification of Diseases and Related Health Problems(ICD-11)was released on June 18,2018,by the World Health Organization and will come into effect on January 1,2022.Apart from the chapters on the classification of diseases in the conventional medicine(CM),a new chapter,traditional medicine(TM)conditions–Module 1,was added.Low back pain(LBP)is one of the common reasons for the physician visits.The classification codes for LBP in the ICD-11 are vital to documenting accurate clinical diagnoses.Methods:The qualitative case study method was adopted.The secondary use data for 100 patients were randomly selected using the ICD-11 online interface to find the classification codes for both the CM section and the TM Conditions–Module 1(TM1)section for LBP diagnosis.Results:Of the 27 codes obtained from the CM section,six codes were not relevant to LBP,whereas the other 21 codes represented diagnoses of LBP and its related diseases or syndromes.In the TM1 section,six codes for different patterns and disorders represented the diagnoses for LBP from the TM perspective.Conclusion:This study indicates that specific diagnoses of LBP can be represented by the combination of CM classification codes and TM1 classification codes in the ICD-11;the CM codes represent specific and accurate clinical diagnoses for LBP,whereas the TM1 codes add more accuracy to the diagnoses of different patterns from the TM perspective.
文摘In machines learning problems, Support Vector Machine is a method of classification. For non-linearly separable data, kernel functions are a basic ingredient in the SVM technic. In this paper, we briefly recall some useful results on decomposition of RKHS. Based on orthogonal polynomial theory and Mercer theorem, we construct the high power Legendre polynomial kernel on the cube [-1,1]<sup>d</sup>. Following presentation of the theoretical background of SVM, we evaluate the performance of this kernel on some illustrative examples in comparison with Rbf, linear and polynomial kernels.
文摘The existence of bounded and unbounded solutions to nonlinear reactinn-diffusion problemut = △Φ(u) + F(u,x,t) with initial or initial-boundal conditinns is discussed when u=u(x, t), x ∈ R. Simple criteria are given.
基金Supported by Shandong Provincial Natural Science Foundation,China(Grant No.ZR2014JL001)the Shandong Province Higher Educational Science and Technology Program(Grant No.J13LI04)the Excellent Young Scholars Research Fund of Shandong Normal University of China
文摘An edge-coloring of a graph G is an coloring of a graph G is an edge-coloring of G such assignment of colors to all the edges of G. A go- that each color appears at each vertex at least g(v) times. The maximum integer k such that G has a go-coloring with k colors is called the gc-chromatic index of G and denoted by X'gc (G). In this paper, we extend a result on edge-covering coloring of Zhang X'gc( ) = δg(G), and Liu in 2011, and give a new sufficient condition for a simple graph G to satisfy ' x'gc(G)=δg(G),where δg(G)=minv∈V(G){[d(v)/g(v)]}.
基金the National Natural Science Foundation of China(No.11371242)。
文摘Logistic regression has been proved as a promising method for machine learning,which focuses on the problem of classification.In this paper,we present anl_(1)-l_(2)-regularized logistic regression model,where thel1-norm is responsible for yielding a sparse logistic regression classifier and thel_(2)-norm for keeping betlter classification accuracy.To solve thel_(1)-l_(2)-regularized logistic regression model,we develop an alternating direction method of multipliers with embedding limitedlBroyden-Fletcher-Goldfarb-Shanno(L-BFGS)method.Furthermore,we implement our model for binary classification problems by using real data examples selected from the University of California,Irvine Machines Learning Repository(UCI Repository).We compare our numerical results with those obtained by the well-known LIBSVM and SVM-Light software.The numerical results show that ourl_(1)-l_(2)-regularized logisltic regression model achieves better classification and less CPU Time.
基金the CERNET Innovation Project(No.NGII20190315)the Foundation of A Hundred Youth Talents Training Program of Lanzhou Jiaotong University.
文摘The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this paper proposes an unbalanced accuracy weighted random forest algorithm(UAW_RF)based on the adaptive step size artificial bee colony optimization.It combines the ideas of decision tree optimization,sampling selection,and weighted voting to improve the ability of stochastic forest algorithm when dealing with biased data classification.The adaptive step size and the optimal solution were introduced to improve the position updating formula of the artificial bee colony algorithm,and then the parameter combination of the random forest algorithm was iteratively optimized with the advantages of the algorithm.Experimental results show satisfactory accuracies and prove that the method can effectively improve the classification accuracy of the random forest algorithm.