期刊文献+

大规模时间序列数据库降维及相似搜索 被引量:20

Dimensionality Reduction and Similarity Search in Large Time Series Databases
下载PDF
导出
摘要 提出一种基于分段多项式表示(PPR)的时间序列数据库相似查询的系统化方法.PPR是一类基于线性多项式回归的正交变换.用PPR变换索引时间序列数据在理论上具备非漏报性质.文中分析了PPR的计算复杂性以及查询阈值的下界,并提出了一种衡量时间序列相似查询算法之查询效率的定量指标.与基于离散傅立叶变换(DFT)和离散小波变换(DWT)的时间序列相似查询算法所作的对比实验表明,所提算法可以用低的索引结构维数获得高的查询效率. The problem of similarity search in time series databases has attracted much research interest in the database and data mining communities in the last decade. A systemic method of indexing and similarity searching in time series databases based on Piecewise Polynomial Representation (PPR) is proposed in this paper. The idea is to map each sub-sequence into a small set of multidimensional rectangles in feature space that is spanned by base of linear polynomial. PPR is a linear polynomial representation, and PAA (Piecewise Aggregate Approximation), an well known time series compression technique, is a special case of PPR. PPR is used as an efficient dimensionality reduction technique to permit similarity search over large time series databases without false dismissals. Computational complexity of PPR is O(n). The lower boundaries of search threshold are estimated, and a detailed performance anlysis of proposed method is presented. The experimental results demonstrate that performances of proposed method are superior to that of DFT (Discrete Fourier Transform) and DWT (Discrete Wavelet Transform) based index techniques.
作者 李爱国 覃征
出处 《计算机学报》 EI CSCD 北大核心 2005年第9期1467-1475,共9页 Chinese Journal of Computers
基金 陕西省科学技术发展计划"十五"攻关项目基金(2000K08G12)资助
关键词 数据库 时间序列 相似搜索 数据挖掘 查询 database time series similarity search data mining query
  • 相关文献

参考文献15

  • 1Keogh E., Kasetty S.. On the need for time series data mining benchmarks: A survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002, 102~111.
  • 2Agrawal R., Faloutsos C.,Swami A.. Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, Chicago, IL, 1993, 69~84.
  • 3Seidl T., Kriegel H.P.. Optimal multi-step k-nearest neighbor search. In: Proceedings of the SIGMOD, Washington, 1998, 154~165.
  • 4Faloutsos C., Ranganathan M., Manolopoulos Y.. Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, 1994, 419~429.
  • 5Rafiei D., Mendelzon A.. Querying time series data based on similarity. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(5): 675~693.
  • 6Chan K.P., Fu A.W.. Efficient time series matching by wavelets. In: Proceedings of the 15th IEEE International Conference on Data Engineering, Sydney, Australia, 1999, 126~133.
  • 7Popivanov I., Miller R.. Similarity search over time-series data using wavelets. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, 2002, 212~221.
  • 8Chu K.K.W., Wong M.H.. Fast time-series searching with scaling and shifting. In: Proceedings of the 18th ACM Symposium on Principles of Database Systems, Philadelphia, PA, 1999, 237~248.
  • 9Perng C.S., Wang H.X., Zhang S.R. et al.. Landmarks: A new model for similarity-based pattern in time series databases. In: Proceedings of the 16th IEEE International Conference on Data Engineering, EI Segundo, CA, 2000, 475~693.
  • 10Keogh E., Chakrabarti K., Pazzani M. et al.. Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems, 2001, 3(3): 263~286.

二级参考文献3

共引文献21

同被引文献253

引证文献20

二级引证文献156

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部