摘要
【目的】基于常用的关联分类算法CBA进行PU学习研究。【方法】将训练集中比例为?的正样本作为未被识别出的正样本,与负样本一起组成未标记样本集,从而构建PU学习场景。其中,基于全部正类别分类关联规则对样本进行分类,并使用分类关联规则相对置信度衡量分类关联规则分类结果的可信度。【结果】当?取值分别为0、0.3、0.6、0.9时,在实验数据集上,本文方法的分类结果的AUC值较CBA算法分别平均提高6.21%、11.15%、13.50%、16.56%,较POSC4.5算法分别平均提高11.27%、15.03%、12.22%、7.37%。【局限】由于未对全部样本中真实正样本所占的比例进行估计,并据此对分类关联规则的置信度进行修正,因而所提方法的分类效果随?取值的增长呈下降趋势。此外,CBA算法会产生大量的冗余规则,而本文并未对其中的规则进行筛选。【结论】本文方法在PU学习场景中的分类效果优于CBA算法和POSC4.5算法。
[Objective] We examine the PU learning with the associative classification algorithm CBA. [Methods] First, we categorized ?% of positive examples as unidentified positive examples, which were used to construct the corpus along with negative samples. Then, we classified examples based on all positive class association rules. Finally, we evaluated the reliability of class association rules with relative confidence. [Results] We used 0%, 30%, 60%, and 90% as the values of ?. Compared to CBA, the AUC of the proposed PU learning algorithm were increased by 6.21%、11.15%、13.50% and 16.56%. Compared to POSC4.5, the AUC increased by 11.27%、15.03%、12.22%, and 7.37%. [Limitations] We did not modify the confidence of the class association rules based on the estimated proportion of positive examples. We found that the classification accuracy of the proposed PU learning algorithm gradually decreased while the value of ? increased. We did not investigate the redundant rules of the CBA algorithm. [Conclusions] The proposed PU learning algorithm did better jobs than CBA and POSC4.5 algorithms.
出处
《数据分析与知识发现》
CSSCI
CSCD
2017年第11期12-18,共7页
Data Analysis and Knowledge Discovery
关键词
关联分类
PU学习
CBA算法
Associative Classification
PU Learning
CBAAlgorithm