摘要
块I/O之间的频繁关联性是存储系统中普遍存在的现象.这种数据块之间的频繁关联性,在改善存储系统的数据布局、优化访问数据的预取策略等方面具有重要意义.传统的频繁关联序列挖掘算法没有考虑数据的时间局部性,不能够有效地挖掘出块I/O之间的频繁关联性.本文提出了一种关联强化窗口下的可时间局部感知的apriori改进算法来挖掘块I/O之间的频繁关联序列.此外,本文还对支持度达不到阈值却又不容忽视的次频繁关联序列进行了挖掘,与频繁序列形成优势互补.实验中利用了三个真实的Trace对该算法进行评估.实验结果表明改进后的apriori算法更适合于挖掘块I/O数据流的频繁和次频繁关联序列.而且,该算法弥补了传统的频繁关联序列挖掘算法对具有时间敏感性的类流数据进行关联挖掘的缺陷.另外,相比较于apriori算法,该算法的时间效率更高.
The frequent correlations between I/O blocks are a common phenomenon in storage systems. These correlations play a significant role in improving data layout, optimizing prefetching and so on. The traditional mining algorithms of frequent correlation sequence do not consider the impact of temporal locality. Therefore, they cannot mine I/O block frequent correlations effectively. In this paper, we propose an improved apriori algorithm based on a strengthen correlation window that is temporal locality aware. In addition, this paper mines the secondary frequent correlation sequence whose support value does not meet the minimum threshold but it is complementary with the frequent correlation sequence. We have evaluated the improved algorithm by using three real traces. Our experimental results show that the proposed algorithm is more applicable to mine the frequent and secondary frequent correlation sequence of I/O block data streams. Moreover,the improved algorithm has the advantage of mining the similar data stream which is time-sensitive in contrast to the traditional mining algorithms. More significantly, the improved algorithm performs more effectively with less time overhead than that of traditional apriori algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第5期990-995,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61272073
61073064)资助
广东省自然科学基金重点项目(S2013020012865)资助
广东省教育厅科技创新项目(2012KJCX0013)资助
中科院计算机体系结构国家重点实验室开放课题项目资助
关键词
关联强化窗口
块I/O关联
频繁关联序列
次频繁关联序列
类流数据
strengthen correlation window
I/O block correlation
frequent correlation sequence
secondary frequent correlation sequence
similar data stream