期刊文献+

基于虚构理论对不平衡数据集中少数类关联规则挖掘的研究 被引量:9

Research on association rule mining for minority classes of unbalanced database by confabulation theory
原文传递
导出
摘要 在网络入侵检测系统中,数据挖掘往往面对的是不平衡数据集,而对不平衡数据集中少数类的挖掘是现在研究的热点.针对不平衡数据集中少数类的挖掘问题,提出了不平衡库关联规则挖掘算法(ARUD).算法首先构造一个知识联接强度矩阵,用来存储所有二项集的支持度计数,然后基于该矩阵挖掘满足最小说服度的所有关联规则,且ARUD算法仅需扫描整个事务数据库1次.采用了UCI数据库中4个典型的不平衡数据集,对比Apriori算法与CFP-Growth算法,ARUD算法能有效提取不平衡数据集中的少数类,并在数据挖掘运行时间和占用内存方面均有性能提升. According to network intrusion detection system, data mining are frequently faced with unbalanced data sets, and the mining of minority classes from unbalanced database is research focus. Aim at mining of minority classes from unbalanced database, this paper proposes an association rules mining from unbalanced database called ARUD, the proposed algorithm through based on pairwise item condition probability to calculate the cogency of find all association rules, ARUD only one passes through the file. In this paper, four typical data sets from the UCI ,compared with Apriori and CFP-Growth, ARUD has the superiority of the approach for classifying minority classes in unbalanced data sets.In addition,ARUD is consistently faster and consumes less memory space.
作者 刘云 向婵
出处 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2017年第1期33-38,共6页 Journal of Yunnan University(Natural Sciences Edition)
基金 国家自然科学基金(61262040)
关键词 不平衡数据集 少数类 关联规则 说服度 unbalanced data sets minority classes association rules cogency
  • 相关文献

参考文献8

二级参考文献102

  • 1Ling C X, Li C. Data mining for direct marketing:Problems and solutions[C]//Proceedings of the 4th international conference on knowledge discovery and data mining. New York, NY, 1998: 73-79.
  • 2Sun Yan-min, Kamel M S, Wong A K C, et aL Cost-Sensitive Boosting for Classification of Imbalanced Data[J]. Pattern Re- cognition, 2007,40(12) : 3358-3378.
  • 3Estabrooks A,Jo T,Japkowicz N. A multiple resampling method for learning from imbalanced data sets [J]. Computational Intel- ligence, 2004,20(1) : 18-36.
  • 4Japkowicz N, Stephen S. The class imbalance problem: A sys- tematic study[J]. Inte/ligent Data Analysis, 2002, 6 (5): 429- 450.
  • 5Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling techniques [J].Journal of Artificial Re- search, 2002,16 : 321-357.
  • 6Drummond C, Holte R C. C4. 5, Class imbalance, and cost sensi- tivity:Why under-sampling beats over-sampling [C] //Procee- dings of the ICML'03 Workshop on Learning from Irnbalanced Data Sets. 2003.
  • 7Kubat M,Matwin S. Addressing the curse of imbalanced train- ing sets:one-sided selection [C]//Proceedings of the 14th Inter- national Conference on Machine Learning. 1997:179-186.
  • 8Holte R C, Acker L E, Porter B W. Concept learning and the problem of small disj uncts[C]//Proceedings of the 11 th joint in- ternational conference on artificial intelligence. ]989:813-818.
  • 9Weiss G M. Mining with rarity: A unifying framework [J]. ACM SIGKDD Explorations Newsletter-Special Issue on Lear- ning from Imbalaneed Datasets, 2004,6 (1) : 7-19.
  • 10Quinlan J R. Improved estimates for the accuracy of small dis- juncts [J]. Machine Learning, 1991,6(1) : 93-98.

共引文献118

同被引文献25

引证文献9

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部