摘要
粗糙集理论是一种有监督学习模型,一般需要适量有标记的数据来训练分类器.但现实一些问题往往存在大量无标记的数据,而有标记数据由于标记代价过大较为稀少.文中结合主动学习和协同训练理论,提出一种可有效利用无标记数据提升分类性能的半监督粗糙集模型.该模型利用半监督属性约简算法提取两个差异性较大的约简构造基分类器,然后基于主动学习思想在无标记数据中选择两分类器分歧较大的样本进行人工标注,并将更新后的分类器交互协同学习.UCI数据集实验对比分析表明,该模型能明显提高分类学习性能,甚至能达到数据集的最优值.
Rough set theory, as an effective supervised learning model, usually relies on the availability of an amount of labeled data to train the classifier. Howerer, in many practical problems, large amount of unlabeled data are readily available, and labeled ones are fairly expensive to obtain because of high cost. In this paper, a semi-supervised rough set model is proposed to deal with the partially labeled data. The proposed model firstly employs two diverse semi-supervised reducts to train its base classifiers on labeled data. The unlabeled ramified samples for two base classifiers are selected to be labeled based on the principle of active learning, and then the updated classifiers learn from each other by labeling confident unlabeled samples to its concomitant. The experimental results on selected UCI datasets show that the proposed model greatly improves the classification performance of partially labeled data, and even the bestperformance of dataset is obtained.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2012年第5期745-754,共10页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.60970061
61075056
61103067)
中国博士后科学基金项目(No.2011M500626
2011M500815)
上海市重点学科建设项目(No.B004)资助
关键词
粗糙集
差别矩阵
半监督约简
主动学习
协同训练
Rough Set, Discernibility Matrix, Semi-Supervised Reduction, Active Learning, Co-Training