Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This...Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This paper presents a geometric distance-based SVM (GDB-SVM). It takes the distance between a point and classified hyperplane as classification rule,and is designed on the basis of theoretical analysis and geometric intuition. Experimental code is derived from LibSVM with Microsoft Visual C ++ 6.0 as system of translating and editing. Four predicted results of five of GDB-SVM are better than those of the method of one against all (OAA). Three predicted results of five of GDB-SVM are better than those of the method of one against one (OAO). Experiments on real data sets show that GDB-SVM is not only superior to the methods of OAA and OAO, but highly scalable for large data sets while generating high classification accuracy.展开更多
文摘Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This paper presents a geometric distance-based SVM (GDB-SVM). It takes the distance between a point and classified hyperplane as classification rule,and is designed on the basis of theoretical analysis and geometric intuition. Experimental code is derived from LibSVM with Microsoft Visual C ++ 6.0 as system of translating and editing. Four predicted results of five of GDB-SVM are better than those of the method of one against all (OAA). Three predicted results of five of GDB-SVM are better than those of the method of one against one (OAO). Experiments on real data sets show that GDB-SVM is not only superior to the methods of OAA and OAO, but highly scalable for large data sets while generating high classification accuracy.