摘要
在数据流挖掘领域中,频繁项集的挖掘是基础性的,也是比较关键的问题,但是现在的算法大多都是基于在单数据流中挖掘频繁项集,传统在单数据流上挖掘频繁项集的算法有Apriori算法,由于挖掘多个数据流上的频繁项集存在数据和模式冗余问题,对算法的时间和空间效率都具有很大的挑战性。本文基于Apriori算法和多线程并发技术的思想改进了Apriori算法生成在分布式数据流上挖掘频繁项算法A-Apriori,它采用逐层迭代和并发技术来解决多个数据流同时到来频繁项的挖掘问题。实验表明,该算法在保证挖掘精度的前提下,可以比其它在分布式数据流中挖掘频繁项的算法获得更好的效率。
In the field of stream data mining,the mining of frequent item sets is a fundamental and pivotal problem.However,the algorithms nowadays mostly aim at the mining of frequent item in a single data stream.Apriori algorithm conventionally solves the mining of frequent item in a single data stream.As there exists the redundancy of data and pattern in the mining of frequent item sets in to be deleted multiple data streams,it challenges the temporal efficiency and the spacial efficiency of the algorithm.Based on the improvement of Apriori algorithm and Concurrent multi -threading technology this paper achieves the A -Apriori Algorithm that can mine the frequent item in to be deleted distributed data streams.It adopts Iterative method and concurrent programming to solve the problem when multiple data streams concur.Experimental results is given to show that the proposed algorithm can mine the frequent item in to be deleted distributed data streams more efficiently and ensure the accuracy at the same time.
出处
《微计算机信息》
2010年第30期144-145,164,共3页
Control & Automation
基金
基金申请人:毛国君
项目名称:分布式数据流的集成模式挖掘模型和概念漂移检测算法研究
基金颁发部门:国家自然科学基金委(60496322)
关键词
分布式数据流
频繁项
多线程并发技术
distributed data stream
Frequent item
Concurrent multi-threading technology