期刊文献+

基于关联分类算法的PU学习研究 被引量:1

Evaluating PU Learning Based on Associative Classification Algorithm
原文传递
导出
摘要 【目的】基于常用的关联分类算法CBA进行PU学习研究。【方法】将训练集中比例为?的正样本作为未被识别出的正样本,与负样本一起组成未标记样本集,从而构建PU学习场景。其中,基于全部正类别分类关联规则对样本进行分类,并使用分类关联规则相对置信度衡量分类关联规则分类结果的可信度。【结果】当?取值分别为0、0.3、0.6、0.9时,在实验数据集上,本文方法的分类结果的AUC值较CBA算法分别平均提高6.21%、11.15%、13.50%、16.56%,较POSC4.5算法分别平均提高11.27%、15.03%、12.22%、7.37%。【局限】由于未对全部样本中真实正样本所占的比例进行估计,并据此对分类关联规则的置信度进行修正,因而所提方法的分类效果随?取值的增长呈下降趋势。此外,CBA算法会产生大量的冗余规则,而本文并未对其中的规则进行筛选。【结论】本文方法在PU学习场景中的分类效果优于CBA算法和POSC4.5算法。 [Objective] We examine the PU learning with the associative classification algorithm CBA. [Methods] First, we categorized ?% of positive examples as unidentified positive examples, which were used to construct the corpus along with negative samples. Then, we classified examples based on all positive class association rules. Finally, we evaluated the reliability of class association rules with relative confidence. [Results] We used 0%, 30%, 60%, and 90% as the values of ?. Compared to CBA, the AUC of the proposed PU learning algorithm were increased by 6.21%、11.15%、13.50% and 16.56%. Compared to POSC4.5, the AUC increased by 11.27%、15.03%、12.22%, and 7.37%. [Limitations] We did not modify the confidence of the class association rules based on the estimated proportion of positive examples. We found that the classification accuracy of the proposed PU learning algorithm gradually decreased while the value of ? increased. We did not investigate the redundant rules of the CBA algorithm. [Conclusions] The proposed PU learning algorithm did better jobs than CBA and POSC4.5 algorithms.
作者 杨建林 刘扬
出处 《数据分析与知识发现》 CSSCI CSCD 2017年第11期12-18,共7页 Data Analysis and Knowledge Discovery
关键词 关联分类 PU学习 CBA算法 Associative Classification PU Learning CBAAlgorithm
  • 相关文献

参考文献3

二级参考文献43

  • 1张丽娟,李舟军.分类方法的新发展:研究综述[J].计算机科学,2006,33(10):11-15. 被引量:20
  • 2Ren J, Lee S D, Chen X, et al. Naive Bayes classification of uncertain data[C]//Proceedings of IEEE International Conference on Data Mining, 2009.
  • 3Tsang S, Kao B, Yip K Y, et al. Decision trees for uncertain data[C]//Proceedings of IEEE International Conference on Data Engineering, 2009: 441--444.
  • 4Liu B, Dai Y, Li X, et al. Building text classifiers using positive and unlabeled examples[C]//Proceedings of IEEE International Conference on Data Mining, 2003: 179-186.
  • 5Fung G P C, Yu J X, Lu H, et al. Text classification without negative examples revisits[J]. 1EEE Transactions on Knowledge and Data Engineering, 2006, 18 (1): 6-20.
  • 6Calvo B, Larranaga P, Lozano J. Learning Bayesian classifiers from positive and unlabeled examples[J]. Pattern Recognition Letters, 2007, 28(16): 2375-2384.
  • 7Tax D, Duin R. Data description in subspaces[C]//Proceedings of International Conference on Pattern Recognition, 2000: 672-675.
  • 8Tax D M J, Duin R P W. Support vector domain description[J]. Pattern Recognition Letters, 1999,20:1191-1199.
  • 9Scholkopf B, Platt J, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution[J]. Neural Computation, 2001, 13(7): 1443-1471.
  • 10Hempstalk K, Frank E, Witten I. One-class classification by combining density and class probability estimation[C]// Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008: 505-519.

共引文献14

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部