摘要
频繁项集挖掘是关联规则挖掘中至关重要的一步。对于稠密数据集的频繁项集挖掘,传统的挖掘算法往往产生大量无用的中间结果,造成内存利用率的极大浪费,尤其是在支持度较低的情况下。Diffsets算法通过引入"差集"的概念,在一定程度上解决了挖掘过程中产生的大量中间结果与内存容量之间的矛盾。改进型Diffsets算法是在原算法的基础上,在差集运算过程中根据差集中所包含的事务标识个数进行递减排序,进一步减少了挖掘过程中产生的中间结果数量。分析与实例表明,改进后的算法在执行过程中将占用更少的内存空间,加快了算法的收敛速度。
Mining frequent items is a key step in association rules mining. As to the mining frequent items of dense datasets, the traditional mining algorithm always turn out a great deal of useless intermediate results which occupies a large proportion of the memory, especially in a low values of support. Diffsets algorithm introduces the conception of differences,and to some extent,it provides a solution of dealing with the contradiction between those multiintermediate results and the memory capacity. This improved Diffsets algorithm on the basis of original algorithm ranks the number of tids in a degressive way during the the calculation course,in this way,the amount of intermidiate results can be decreased. The analysis and examples show that this imporved algorithm takes less memory space in the operation process and accelerates the convergence pace of the algorithm.
出处
《现代电子技术》
2008年第22期80-83,87,共5页
Modern Electronics Technique
基金
宁夏自然科学基金资助项目(NZ0697)
宁夏高等学校科学技术研究项目(2006JY018)