摘要
提出一种称为DMSP(DistributedMiningofSequentialPatterns)的算法,以解决分布式环境下的序列模式挖掘问题.其主要思想是:利用前缀投影技术划分模式搜索空间,降低数据库的规模,生成局部序列模式;利用模式前缀指定选举站点降低通信开销;多线程异步运行,提高算法的并行性.实验结果显示:在具有海量数据的局域网环境中,DMSP算法的性能优于将数据集中后采用GSP算法65%以上.
An algorithm called DMSP (Distributed Mining of Sequential Patterns) is proposed in order to deal with mining sequential patterns in distributed environment. The main idea is that each site utilizes prefix-projected technique which divides the pattern search space and decreases the size of the database to generate local sequential patterns; each site utilizes polling site associated with prefix to decrease the cost of communication; multi-threads run asynchronously in each site to increase the concurrency of algorithm. The experiments show that algorithm DMSP is outperforming applying algorithm GSP after centralizing data by above 65 percent and scaleable over LAN with huge amount of data.
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2004年第5期737-741,共5页
Journal of Fudan University:Natural Science
基金
国家自然科学基金资助项目(70171052
60075015)