随着硬件和通信技术的飞速发展,数据流技术已广泛应用于金融分析、网络监控及传感器网络等诸多领域,这类应用通常具有高速、海量、连续和实时等特性.因此,在数据流上渐进、实时地更新索引成为一个极具价值和挑战性的问题.为了克服现有...随着硬件和通信技术的飞速发展,数据流技术已广泛应用于金融分析、网络监控及传感器网络等诸多领域,这类应用通常具有高速、海量、连续和实时等特性.因此,在数据流上渐进、实时地更新索引成为一个极具价值和挑战性的问题.为了克服现有支持频繁更新的索引树性能大都深受处理器缓存失效率的影响,提出了一种新颖的基于双Memo的量化R*索引树-QDM-Tree(Quantized R*-tree with Double Memos),并给出了相应的插入、删除、更新和范围查询算法,理论分析表明:与已有R*树及其变种相比,该索引树能成倍地压缩树结点,具有更强支持频繁更新的能力.展开更多
Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting...Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.展开更多
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream...Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.展开更多
针对大型滚转机器轴承故障诊断应用场景中传统故障识别技术通常存在诊断识别精度低的问题,在频域分析基础上提出了一种新的数据挖掘框架——关联频繁模式集挖掘框架(Associated frequency patterns mining framework,AFPMF),由数据预处...针对大型滚转机器轴承故障诊断应用场景中传统故障识别技术通常存在诊断识别精度低的问题,在频域分析基础上提出了一种新的数据挖掘框架——关联频繁模式集挖掘框架(Associated frequency patterns mining framework,AFPMF),由数据预处理、关联频繁模式集挖掘和故障状态监测组成。首先,在数据预处理过程中,AFPMF在时域上使用时间窗分块划分机械振动数据流,再使用傅立叶变换对数据流进行时频变换实现故障频率特征提取。其次,使用基于滑动窗的关联频繁模式树构建压缩树,求解关联频繁模式集,实现数据挖掘过程。最后,根据数据挖掘结果中出现的振动频率判别潜在故障,从而实现监测故障状态。通过对比AFPMF和传统方法在轴承故障诊断应用场景的实验结果可知,相比传统方案,AFPMF具有更优的故障识别性能。展开更多
文摘随着硬件和通信技术的飞速发展,数据流技术已广泛应用于金融分析、网络监控及传感器网络等诸多领域,这类应用通常具有高速、海量、连续和实时等特性.因此,在数据流上渐进、实时地更新索引成为一个极具价值和挑战性的问题.为了克服现有支持频繁更新的索引树性能大都深受处理器缓存失效率的影响,提出了一种新颖的基于双Memo的量化R*索引树-QDM-Tree(Quantized R*-tree with Double Memos),并给出了相应的插入、删除、更新和范围查询算法,理论分析表明:与已有R*树及其变种相比,该索引树能成倍地压缩树结点,具有更强支持频繁更新的能力.
基金Project supported by the Shanghai Leading Academic Discipline Project(Grant No.J50103)the Graduate Student Innovation Foundation of Shanghai University(Grant No.SHUCX112167)
文摘Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.
基金Supported by the National Natural Science Foun-dation of China (60473073)
文摘Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.
文摘针对大型滚转机器轴承故障诊断应用场景中传统故障识别技术通常存在诊断识别精度低的问题,在频域分析基础上提出了一种新的数据挖掘框架——关联频繁模式集挖掘框架(Associated frequency patterns mining framework,AFPMF),由数据预处理、关联频繁模式集挖掘和故障状态监测组成。首先,在数据预处理过程中,AFPMF在时域上使用时间窗分块划分机械振动数据流,再使用傅立叶变换对数据流进行时频变换实现故障频率特征提取。其次,使用基于滑动窗的关联频繁模式树构建压缩树,求解关联频繁模式集,实现数据挖掘过程。最后,根据数据挖掘结果中出现的振动频率判别潜在故障,从而实现监测故障状态。通过对比AFPMF和传统方法在轴承故障诊断应用场景的实验结果可知,相比传统方案,AFPMF具有更优的故障识别性能。