摘要
大数据分析的理论核心就是数据挖掘,关联规则挖掘算法是数据挖掘的重要分支,其包含频繁项集的生成和关联规则的产生两个步骤,频繁项集的生成过程中算法开销占据很大成本。从最大频繁项集的性质入手,在改变数据存储结构的基础上采用M-Bisearch的思想,通过对存储空间进行压缩来减少扫描次数和降低支持度计算开销,从而达到提升算法执行效率的目的。实验表明,改进算法在处理中长模式的频繁项集挖掘问题时具有明显的优越性。
Data mining is the core of big data analysis, and association rule mining algorithm is an important branch of data miningwhich contains two steps: the generation of frequent itemsets and the generation of association rules. The process of generating frequent itemsets in overhead occupies a large cost. This paper starts with the nature of the maximal frequent itemsets, adopts the idea of M-bisearch on the basis of hanging data storage structure, reduces computation cost of the scanning times and the support degree though compressing storage space, so as to achieve the goal of improving the efficiency of the algorithm.
作者
李宝林
周坤
李仕伟
LI Bao-lin ZHOU Kun LI Shi-wei(College of Computer Science, China West Normal University, Nanchong 637000, China)
出处
《成都信息工程大学学报》
2016年第5期463-468,共6页
Journal of Chengdu University of Information Technology
基金
四川省科技厅支撑资助项目(2013SZ0056)