基于关联分类算法的PU学习研究被引量：1

Evaluating PU Learning Based on Associative Classification Algorithm

导出

摘要【目的】基于常用的关联分类算法CBA进行PU学习研究。【方法】将训练集中比例为?的正样本作为未被识别出的正样本,与负样本一起组成未标记样本集,从而构建PU学习场景。其中,基于全部正类别分类关联规则对样本进行分类,并使用分类关联规则相对置信度衡量分类关联规则分类结果的可信度。【结果】当?取值分别为0、0.3、0.6、0.9时,在实验数据集上,本文方法的分类结果的AUC值较CBA算法分别平均提高6.21%、11.15%、13.50%、16.56%,较POSC4.5算法分别平均提高11.27%、15.03%、12.22%、7.37%。【局限】由于未对全部样本中真实正样本所占的比例进行估计,并据此对分类关联规则的置信度进行修正,因而所提方法的分类效果随?取值的增长呈下降趋势。此外,CBA算法会产生大量的冗余规则,而本文并未对其中的规则进行筛选。【结论】本文方法在PU学习场景中的分类效果优于CBA算法和POSC4.5算法。 [Objective] We examine the PU learning with the associative classification algorithm CBA. [Methods] First, we categorized ？% of positive examples as unidentified positive examples, which were used to construct the corpus along with negative samples. Then, we classified examples based on all positive class association rules. Finally, we evaluated the reliability of class association rules with relative confidence. [Results] We used 0%, 30%, 60%, and 90% as the values of ？. Compared to CBA, the AUC of the proposed PU learning algorithm were increased by 6.21%、11.15%、13.50% and 16.56%. Compared to POSC4.5, the AUC increased by 11.27%、15.03%、12.22%, and 7.37%. [Limitations] We did not modify the confidence of the class association rules based on the estimated proportion of positive examples. We found that the classification accuracy of the proposed PU learning algorithm gradually decreased while the value of ？ increased. We did not investigate the redundant rules of the CBA algorithm. [Conclusions] The proposed PU learning algorithm did better jobs than CBA and POSC4.5 algorithms.

作者杨建林刘扬

机构地区南京大学信息管理学院江苏省数据工程与知识服务重点实验室

出处《数据分析与知识发现》 CSSCI CSCD 2017年第11期12-18,共7页 Data Analysis and Knowledge Discovery

关键词关联分类 PU学习 CBA算法 Associative Classification PU Learning CBAAlgorithm

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献3

1潘世瑞,张阳,李雪,王勇.针对不确定正例和未标记学习的最近邻算法(英文)[J].计算机科学与探索,2010,4(9):769-779. 被引量：2
2黄再祥,周忠眉,何田中,郑艺峰.改进的多类不平衡数据关联分类算法[J].模式识别与人工智能,2015,28(10):922-929. 被引量：11
3刘红梅.基于关联规则的分类方法初探[J].电脑知识与技术,2009,5(1X):535-536. 被引量：4

二级参考文献43

1张丽娟,李舟军.分类方法的新发展:研究综述[J].计算机科学,2006,33(10):11-15. 被引量：20
2Ren J, Lee S D, Chen X, et al. Naive Bayes classification of uncertain data[C]//Proceedings of IEEE International Conference on Data Mining, 2009.
3Tsang S, Kao B, Yip K Y, et al. Decision trees for uncertain data[C]//Proceedings of IEEE International Conference on Data Engineering, 2009: 441--444.
4Liu B, Dai Y, Li X, et al. Building text classifiers using positive and unlabeled examples[C]//Proceedings of IEEE International Conference on Data Mining, 2003: 179-186.
5Fung G P C, Yu J X, Lu H, et al. Text classification without negative examples revisits[J]. 1EEE Transactions on Knowledge and Data Engineering, 2006, 18 (1): 6-20.
6Calvo B, Larranaga P, Lozano J. Learning Bayesian classifiers from positive and unlabeled examples[J]. Pattern Recognition Letters, 2007, 28(16): 2375-2384.
7Tax D, Duin R. Data description in subspaces[C]//Proceedings of International Conference on Pattern Recognition, 2000: 672-675.
8Tax D M J, Duin R P W. Support vector domain description[J]. Pattern Recognition Letters, 1999,20:1191-1199.
9Scholkopf B, Platt J, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution[J]. Neural Computation, 2001, 13(7): 1443-1471.
10Hempstalk K, Frank E, Witten I. One-class classification by combining density and class probability estimation[C]// Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008: 505-519.

共引文献14

1王旭,刘明刚.关联规则研究[J].经济研究导刊,2010(11):198-199. 被引量：1
2夏华林,张仰森.基于规则与统计的Web突发事件新闻多层次分类[J].计算机应用,2012,32(2):392-394. 被引量：8
3张金蕾,李梅,张阳,梁春泉,王勇.P-AnDT:平均n依赖决策树的正例未标注学习算法[J].计算机应用研究,2016,33(7):1941-1944. 被引量：2
4杜利敏,徐扬.基于Biased-SVM的非平衡半监督分类算法[J].河南大学学报（自然科学版）,2017,47(4):481-489. 被引量：3
5吴萌,侯凌燕,杨大利.基于多类不平衡分类的改进AdaBoost算法研究[J].北京信息科技大学学报（自然科学版）,2018,33(1):76-81.
6赵兴锋,付冬梅,裴梓博,李晓刚.分段式剂量响应函数优化及碳钢腐蚀等级判别方法[J].腐蚀与防护,2018,39(10):805-809. 被引量：3
7刘恺,包月青,赵先锋.改进的软件缺陷预测模型研究[J].浙江工业大学学报,2019,47(2):225-229. 被引量：5
8李家辉,周忠眉.基于多次学习和关联度的关联分类改进算法[J].南京大学学报（自然科学版）,2019,55(4):564-572. 被引量：1
9王凯亮,陆俊,徐志强,齐增清,龚钢军,王赟.基于先验知识与DBM采样的类不平衡用电数据分类方法[J].电力系统自动化,2019,43(20):57-64. 被引量：11
10李家宇,邓良益,张慧芳,刘亮君,陈李想.基于大数据的个性化音乐推荐系统仿真[J].信息与电脑,2019,31(19):67-68.

同被引文献3

1蔡代纯,谭新良.文本自动分类技术及其对图书馆学的影响[J].现代情报,2006,26(9):13-14. 被引量：4
2艾楚涵,姜迪,吴建德.基于主题模型和关联规则的专利文本数据挖掘研究[J].中北大学学报（自然科学版）,2019,40(6):524-530. 被引量：9
3杨亚,易远弘.图书馆海量学术资源自动分类模型研究[J].知识管理论坛,2018(3):172-179. 被引量：2

引证文献1

1姚亮亮.基于关联规则的图书馆中文文本自动分类方法[J].科技资讯,2020,18(14):171-171.

1全秀祥,周忠眉,黄再祥.一种改进的关联分类算法[J].计算机工程与科学,2017,39(10):1966-1970.
2贾晓琳,赵健宇.基于翻转课堂模式的英美文化学习资源库建设[J].现代交际,2017(24):45-45.
3黄亚东,刘渊.使用多支持度的关联规则分类算法[J].计算机应用与软件,2017,34(9):246-252. 被引量：2
4刘旭.一种高效的稀有天体光谱检索方法[J].软件,2017,38(10):185-188.
5郭成志,管坪云,任飞翔,杨武年.隆昌市基本农田布局调整影响评价[J].河南科学,2017,35(9):1445-1450.
6李雪萍.云端漫步开启数字化学习的新时代[J].中小学信息技术教育,2017,0(11):30-33. 被引量：5
7苗春生,何东坡,王坚红,史达伟.基于C4.5算法的长江中下游地区夏季降水预测模型研究及应用[J].气象科学,2017,37(2):256-264. 被引量：9
8张新英,付川南.一种高效的多类型数据挖掘算法[J].中国电子科学研究院学报,2017,12(4):359-364. 被引量：10
9朱辉,窦慧莉.基于决策粗糙集模型的局部属性约简[J].电子设计工程,2017,25(21):64-67. 被引量：3
10赵永彬,陈硕,刘明,王佳楠,贲驰.流计算与内存计算架构下的运营状态监测分析[J].计算机应用,2017,37(10):3029-3033. 被引量：2

数据分析与知识发现

2017年第11期

浏览历史

内容加载中请稍等...

基于关联分类算法的PU学习研究被引量：1

参考文献3

二级参考文献43

共引文献14

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于关联分类算法的PU学习研究 被引量：1

参考文献3

二级参考文献43

共引文献14

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于关联分类算法的PU学习研究被引量：1