Support vector machines (SVMs) aim to find an optimal separating hyper-plane that maximizes separation between two classes of training examples (more precisely, maximizes the margin between the two classes of examp...Support vector machines (SVMs) aim to find an optimal separating hyper-plane that maximizes separation between two classes of training examples (more precisely, maximizes the margin between the two classes of examples). The choice of the cost parameter for training the SVM model is always a critical issue. This analysis studies how the cost parameter determines the hyper-plane; especially for classifications using only positive data and unlabeled data. An algorithm is given for the entire solution path by choosing the 'best' cost parameter while training the SVM model. The performance of the algorithm is compared with conventional implementations that use default values as the cost parameter on two synthetic data sets and two real-world data sets. The results show that the algorithm achieves better results when dealing with positive data and unlabeled classification.展开更多
基金Supported by the National Natural Science Foundation of China(Nos.90604025 and 60703059)the Chinese Young Faculty Research Fund(No.20070003093)
文摘Support vector machines (SVMs) aim to find an optimal separating hyper-plane that maximizes separation between two classes of training examples (more precisely, maximizes the margin between the two classes of examples). The choice of the cost parameter for training the SVM model is always a critical issue. This analysis studies how the cost parameter determines the hyper-plane; especially for classifications using only positive data and unlabeled data. An algorithm is given for the entire solution path by choosing the 'best' cost parameter while training the SVM model. The performance of the algorithm is compared with conventional implementations that use default values as the cost parameter on two synthetic data sets and two real-world data sets. The results show that the algorithm achieves better results when dealing with positive data and unlabeled classification.