摘要
如何从密集数据库中高效挖掘频繁项集一直是数据挖掘领域研究的难点和重点。文章介绍了一种新的数据存储格式—异集。将密集数据库转换为异集数据库,可大幅度降低数据库的规模、挖掘过程产生的中间结果以及CPU计算时间。该文给出了一个基于异集数据库的频繁项集的挖掘算法,实验表明该算法有效。
How to mine frequent itemsets efficiently from dense databases has been a difficult and important problem in data mining field.This paper presents a novel data format:diffset.A switch from dense database to diffset database will drastically cut down the magnitude of the database,the size of the intermediate results and CPU computing time.An algorithm mining frequent itemsets based on diffset database is presented,and the experiments show that the algorithm is valid for frequent itemsets mining.
出处
《计算机工程与应用》
CSCD
北大核心
2005年第8期173-175,232,共4页
Computer Engineering and Applications
关键词
异集
关联规则
频繁项集
密集数据库
diffsets,association rule,frequent itemset,dense database