期刊文献+

CBC-DS:基于频繁闭模式的数据流分类算法 被引量:3

CBC-DS:A Classification Algorithm Based on Closed Frequent Patterns for Mining Data Streams
下载PDF
导出
摘要 基于关联规则的分类算法通常根据频繁模式生成类关联规则,但频繁模式挖掘易遭受组合爆炸问题,影响算法效率.并且数据流的出现也对分类算法提出了新的挑战.相对于频繁模式,频繁闭模式的数目较少,挖掘频繁闭模式的算法通常具有较高的效率.为此,提出了一种高效的基于频繁闭模式的数据流分类算法—CBC-DS.主要贡献在于:1)提出了一种基于逆文法顺序FP-Tree的频繁闭项集单遍挖掘过程,用于挖掘类关联规则,该过程采用了一种混合项顺序搜索策略以满足数据流挖掘的单遍性需求,并采用位图技术提高效率;2)提出了"自支持度"概念,用于筛选规则以提高算法分类精度.实验表明,位图技术能够提高算法速度2倍以上,利用自支持度能够提高算法平均精度0.5%左右;最终CBC-DS算法的平均分类精度比经典算法CMAR高1%左右,并且CBC-DS算法的规则挖掘速度远快于CMAR算法. The classification algorithms based on association rules generally generate classification association rules by frequent patterns. As mining frequent patterns often suffer from the problem of combinatorial explosion, the efficiency of the algorithms is low. Moreover, the emergence of data streams has posed new challenges for classification algorithms. In contrast to frequent patterns, the number of closed frequent patterns is less, so that the efficiency of algorithms for mining closed frequent patterns is higher. A novel and efficient closed-frequent-patterns based classification algorithm, CBC-DS, is proposed for classifying data streams. The contributions are listed as follows. (1) a single-pass closed frequent itemsets mining process based on reverse lexicographic order FP-tree is introduced for mining classification association rules, which uses a kind of mixed item-ordering searching policy to satisfy the single-pass requirement of data streams and uses the bitmap technology to improve the efficiency; (2) the concept of self-support for filtering rules is proposed to improve the precision. The experimental results show that the bitmap technology can improve the efficiency of the algorithm about twice at least and the average classifying precision can be improved about 0. 5% by using self-support. Eventually, the average precision of CBC-DS is about 1% higher than that of CMAR, and CBC-DS is much faster than CMAR.
出处 《计算机研究与发展》 EI CSCD 北大核心 2009年第5期779-786,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60573057 60704038)~~
关键词 数据流 分类 关联规则 频繁闭模式 自支持度 data stream classification association rule closed frequent pattern self-support
  • 相关文献

参考文献14

  • 1Wang J, Karypis G. HARMONY: Efficiently mining the best rules for classification [C] //Proc of 2005 SIAM Conf of Data Mining (SDM'05). 2005: 205-216
  • 2Liu B, Hsu W, Ma Y. Integrating classification and association rule mining [C] //Proc of KDD'98. 1998:80-86
  • 3Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules [C] //Proc of ICDM'01. Berlin: Springer, 2001:369-376
  • 4王鹏,吴晓晨,王晨,汪卫,施伯乐.CAPE——数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,41(10):1677-1683. 被引量:7
  • 5李宏,陈松乔,易丽君,周明,李翔.基于闭合模式的高维生物数据分类算法研究[J].小型微型计算机系统,2007,28(8):1423-1426. 被引量:1
  • 6Gosta G, Jianfei Z. Efficiently Using prefix-trees in mining frequent itemsets [C] //Proc of FIMI'04. Piscataway, NJ: IEEE, 2003
  • 7Chi Y, Wang H, Yu P S, et al. Moment: Maintaining closed frequent itemsets over a stream sliding window [C]//Proc of ICDM'04. Piscataway, NJ: IEEE, 2004:59-66
  • 8Pei J, Han J, Wang J. Closet+: Searching for the best strategies for mining frequent closed itemsets [C]//Proc of SIGKDD '03. New York: ACM, 2003
  • 9Burdiek D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases [C] //Proc of the 17tb Int Conf on Data Engineering. Piseataway, NJ: IEEE, 2001:443-452
  • 10Coenen F. LUCS KDD implementation of CMAR [OL]. [2007-10-07J. http://www. esc. liv. ac. uk/-frans/KDD/ Software/CMAR/emar. html, The University of Liverpool

二级参考文献21

  • 1李颖新,阮晓钢.基于支持向量机的肿瘤分类特征基因选取[J].计算机研究与发展,2005,42(10):1796-1801. 被引量:51
  • 2J Han, M Kamber. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann, 2000
  • 3B Babcock, S Babu, M Datar, et al. Models and issues in data stream systems. In: Proc of ACM Symp on Principles of Database Systems (PODS-02). New York: ACM Press, 2002
  • 4Y Chen, G Dong, J Han,et al. Multi-dimensional regression analysis of time-series data streams. In: Proc of Very Large Database (VLDB02). San Francisco: Morgan Kaufmann, 2002
  • 5J-M Adamo. Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms. New York:Springer-Verlag, 2001
  • 6G Hulten, L Spencer, P Domingos. Mining time-changing data streams. In: Proc of the Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD01). New York: ACM Press, 2001. 97~106
  • 7Haixun Wang, Wei Fan Philip S Yu, Jiawei Han. Mining concept-drifting data streams using ensemble classifiers. In: Proc of the Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD03). New York: ACM Press, 2003
  • 8B Liu, W Hsu, Y Ma. Integrating classification and association rule mining. KDD'98, New York, 1998
  • 9W Li, J Han, J Pei. CMAR: Accurate and efficient classiffication based on multiple class-association rules. In: Proc of ICDM' 01.Washington, D C: IEEE Computer Society Press, 2001. 369~376
  • 10X Yin, J Han. CPAR: Classification based on predictive association rules. The 2003 SIAM Int'l Conf on Data Mining (SDM'03), San Fransisco, CA, 2003

共引文献6

同被引文献71

引证文献3

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部