摘要
随着互联网技术的发展,网络数据变得越来越巨大,如何从中挖掘有效信息成为人们研究的重点。近年来频繁项集挖掘由于其在关联规则挖掘、相关挖掘等任务中的相关重要作用,越来越受到人们的重视。文中针对分布式计算环境下频繁项集挖掘算法的研究,对PFP-Growth算法进行了改进,通过MapReduce编程模型对改进的PFP-Growth算法进行了实现和应用,使用户可以从海量数据中高效地获得所有需要的频繁项集。实验结果表明算法在针对海量数据时具有较高的效率和伸缩性。
As the development of Intemet,the data on it becomes more massive. How to mine useful information from the Interact is the key of study. In recent years, frequent item mining which plays an important role in associations rule mining and correlations mining be- comes popular among researchers. By the study of mining frequent itemsets based on cloud computing, the PFP-Growth algorithm is im- proved. Run the algorithm under the MapReduce model which allows users to obtain all required frequent itemsets efficiently from mas- sive data, the results of experiment shows the algorithm has good efficiency and flexibility.
出处
《计算机技术与发展》
2013年第9期63-65,198,共4页
Computer Technology and Development
基金
安徽高校省级自然科学研究项目(kj2011z039)
安徽工业大学硕士研究生导师创新基金项目(D2011024)