期刊文献+

并行动态位向量频繁闭合序列模式挖掘算法 被引量:2

A parallel dynamic bit vector based frequent closed sequence pattern mining algorithm
下载PDF
导出
摘要 针对在时间和空间上都具有高计算成本的长序列数据库,一个更有效和更紧凑且可以完全提取信息的挖掘模式是当前的研究热点。提出一种并行动态位向量频繁闭合序列模式的挖掘算法(PDBVFCSP),该算法采用多核处理器架构和DBV数据结构相结合的方式,有效加快了序列数据库的处理速度,并对搜索空间进行划分,尽早执行预处理序列的闭合检查,减少了所需的存储空间和挖掘频繁闭合序列模式的执行时间,克服了现有并行挖掘算法通信开销、同步和数据复制等问题。利用重新分配工作的动态负载平衡机制,解决处理器之间的负载均衡问题,最大限度地减少了CPU空闲时间。对DBV-VDF算法和PDBV-FCSP(2-4核)算法进行仿真比较,结果表明,PDBV-FCSP算法在运行时间、内存使用和可伸缩性等方面都有较优的性能提升,且当内核数增加时,性能更优。 For long sequence databases,which have high computational costs both in time and space,a mining model that is more efficient and compact and can extract information completely is a current research hotspot.We propose a parallel dynamic bit vector based frequent closed sequence pattern mining algorithm(PDBV-FCSP),which combines the multi-core processor architecture with the DBV data structure to effectively speed up the processing speed of the sequence database.The search space is divided,and the closed check of the pre-processing sequence is executed as early as possible,which reduces the required storage space and the execution time of mining the frequent closed sequence mode,and overcomes the problems of communication overhead,synchronization and data replication of the existing parallel mining algorithms.The dynamic load balancing mechanism for job redistribution is used to solve the load balancing problem of workloads among processors,thus minimizing the idle CPU time.Simulation results show that,compared with the DBV-VDF algorithm,the PDBV-FCSP algorithm has better performance in terms of running time,memory usage and scalability.And when the core number increases,the performance is better.
作者 陈倩 刘云 高钰莹 CHEN Qian;LIU Yun;GAO Yu-ying(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《计算机工程与科学》 CSCD 北大核心 2018年第10期1717-1725,共9页 Computer Engineering & Science
基金 国家自然科学基金(61262040)
关键词 数据挖掘 闭合序列模式 动态位向量 多核处理器 PDBV-FCSP算法 data mining closed-sequence mode dynamic bit vector multi-core processor PDBV-FCSP algorithm
  • 相关文献

参考文献8

二级参考文献138

  • 1汤小波,龚俭,孙毅.基于NetFlow的网络流量实时计算模型[J].中国教育网络,2008(2_3):101-104. 被引量:1
  • 2施亮,钱雪忠.基于Hadoop的并行FP-Growth算法的研究与实现[J].微电子学与计算机,2015,32(4):150-154. 被引量:15
  • 3刘以安,刘强,邹晓华,王士同.基于向量内积的关联规则挖掘算法研究[J].计算机工程与应用,2006,42(21):172-174. 被引量:15
  • 4孙毅,刘彤,蔡一兵,胡金龙,石晶林.报文分类算法研究[J].计算机应用研究,2007,24(4):5-11. 被引量:9
  • 5王学光.位并行多维数据包分类算法研究[J].计算机工程,2007,33(14):46-48. 被引量:2
  • 6Han Jiawei, Kamber M. Data Mining Concept and Tech- niques [ M ]. San Francisco: Morgan Kaufmann Publishers, 2001.
  • 7Eltabakh M Y, Ouzzani M, Khalil M A, et al. Incremen-tal mining for frequent patterns in evolving time series database [ J ]. IEEE Transactions on Knowledge and Data Engineering, 2008,7 ( 2 ) : 158-165.
  • 8Pei Jian, Han Jiawei, Lu Hongjun, et al. H-mine: Fast and space preserving frequent pattern mining in large database [J]. Data Mining and Knowledge Discovery,2001,11 (2) :53- 87.
  • 9Lin C H, Chiu D Y, Wu Y H, et al. Mining frequent itemsets from data stream with a time-sensitive sliding window [ C]//Proc of 5th SlAM International on Data Mining, Newport Beach: SIAM Press,2005.
  • 10Han Jiawei, Pei Jian, Yin Yiwen. Mining frequent pat- terns without candidate generation [ C ]//Proc of ACMSIGMOD Int' 1 Conference on Management of Data. New York: ACM Press ,2000.

共引文献52

同被引文献14

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部