摘要
对现有的基于MapReduce的并行频繁项集挖掘算法进行了研究,提出一种基于后缀项表的并行闭频繁项集挖掘算法,通过后缀项表的引入及以闭频繁项集挖掘的形式,减少组分间的数据传送量,提高挖掘效率。实验表明,该算法可以有效缩短平均挖掘时间,对于高维大数据具有较好的性能。
Abstract: Based on current frequent itemsets mining using parallel FP-Growth algorithm with MapReduce framework, this pa- per proposed a parallel closed frequent itemsets mining algorithm with a postfix-table based on MapReduce framework. The algorithm generated closed frequent itemsets instead of a11 frequent itemsets. With a postfix-table structure, the algorithm could reduce the amount of data transfer between mappers and reducers efficiently. The experimental results show that the algorithm can shorten mining time efficiently. The algorithm has ~ood performance especially in long- transction mode.
出处
《计算机应用研究》
CSCD
北大核心
2014年第2期373-377,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61170277)
上海市教委科研创新重点项目(12zz137)
上海市一流学科建设项目(S1205YLXK)