摘要
在实际应用中,容易获取大量的未标记样本数据,而样本数据是有限的,因此,半监督分类算法成为研究者关注的热点。文中在协同训练Tri-Training算法的基础上,提出了采用两个不同的训练分类器的Simple-Tri-Training方法和对标记数据进行编辑的Edit-Tri-Training方法,给出了这三种分类方法与监督分类SVM的分类实验结果的比较和分析。实验表明,无标记数据的引入,在一定程度上提高了分类的性能;初始训练集和分类器的选取以及标记过程中数据编辑技术,都是影响半监督分类稳定性和性能的关键点。
In many practical applications,it is easy to obtain the unlabeled training samples, but labeled ones are limited. Therefore, semisupervised classification algorithms have attached much attention. Based on the co-training style semi-supervised algorithm of Tri-Train- ing,proposed the Simple-Tri-Training method through adopting two different training classifiers and Edit-Tri-Training method with data editing to reduce the number of mislabeled data. It also gave three methods classification results and analysis compared with the super- vised classification SVM method. The experiments showed the unlabeled data can improve the performance of classification in some extent. Selection of initial training data and the base classifiers, and data editing of labeling are the key points to improve stability and per- formance of semi-supervised algorithm.
出处
《计算机技术与发展》
2013年第7期77-79,83,共4页
Computer Technology and Development
基金
云南省教育科研基金资助项目(2010Y290)