摘要
在挖掘关联规则的执行过程中,早期循环生成最大项目集的过程是很重要的。文中提出基于哈希表的算法,对生成侯选项目集的过程进行了优化,尤其是对生成二维侯选项目集更是有效。由于在早期循环中,生成侯选项目集的势较小,使得能更有效地修剪数据库,从而减小了后期循环的计算代价,同时也减小了I/O请求。
To find all the large itemsets from candidate sets in eary iterations is usually the domaining factor foroverall data-mining performance. In the paper,we option the algorithm Apriori for the candidate set generation. It is ahash-based algorithm and is especially effective for the generation of candidate set for large 2-itemsets. Furthermore thegeneration of smaller candidate sets enables us to effectively trim the transaction database size at a much earier stageof the iterations,thereby reducing the computational cost for later iterations. The advantage of proposed algorithm alsoprovides us the opportunity of reducing the amount of disk I/O required.
出处
《计算机工程与应用》
CSCD
北大核心
2000年第8期99-102,共4页
Computer Engineering and Applications