摘要
针对PrefixSpan算法不足,采用修改Prefix策略与舍弃非频繁项的方法,减少内存与外存之间频繁地交换,减小在挖掘过程中产生的投影数据库规模,降低构建、扫描投影数据库的时空耗费,从而改进算法。实验结果表明,在长序列模式挖掘中,算法在改进后运行效率比原来提高35%以上,更适用于Web挖掘。
Generating frequent itemsets is a critical step in association rule mining. Through the analysis of Apriori algorithm, a new algorithm for mining frequent itemsets based on set and bit operation is proposed. In this algorithm, digital view is used to express the transaction who used each item, and bit operating is used in digital view to calculate the number of support of each itemset. The problem of repeatedly scanning the database in Apriori algorithm is solved and operating efficiency is improved in the new algorithm.
出处
《科学技术与工程》
2009年第23期7176-7179,共4页
Science Technology and Engineering
基金
广东省自然科学基金项目(5006102)资助