期刊文献+

不等长时间序列滑窗STS距离聚类算法 被引量:10

Non-Equal Time Series Clustering Algorithm with Sliding Window STS Distance
下载PDF
导出
摘要 时间序列的聚类算法是分析预测互联网搜索对象搜索指数和社交网络话题热度随时间变化趋势的重要过程,但目前时间序列聚类算法的研究存在两点不足:首先国内外的时间序列聚类的研究都采用等长划分的时间序列,这往往会丢失许多重要特征点,对数据挖掘结果产生一定的负面影响;其次直接使用时间序列观测值不能准确地度量时间序列的形状相似度。因此,通过标准分数z_score预处理消除了时间序列观测值数量级差异的影响,并设计了基于滑窗的不等长时间序列STS(short time series)距离和类k-means聚类算法的中心曲线计算方法,最终提出了基于滑窗不等长时间序列STS距离的聚类算法,从而解决了不等长时间序列聚类问题。采集互联网上的真实数据集作为测试样本,并进行了大量实验。实验结果表明,基于滑窗不等长时间序列STS距离的聚类算法不仅消除了时间序列观测值数量级差异的影响,解决了不等长时间序列聚类问题,并且比现有算法取得了更优的聚类效果。 Time series clustering is an important algorithm widely used by many applications,such as the analysis and forecast of topics on social media and search words on search engine.However,existing time series clustering algorithms suffer from two shortcomings.Firstly,time series clustering algorithms mostly work only for isometric time series with equal length,leading to the loss of many important features and negative impact of clustering results.Secondly,time series similarity metrics are not able to compare the shape similarity of time series.To address the problems,this paper proposes a novel computation framework to cluster time series data with non-equal length.At first,this paper uses z_score standardization to normalize the observed values of time series data.Next,based on sliding window,this paper extends STS(short time series) distance and designs a new distance measure for time series with non-equal time length.After that,this paper adapts the classic k-means algorithm to develop a new clustering algorithm.The extensive experimental results,by two real datasets that are collected from search engines and public data,successfully verify that the proposed time series clustering algorithm can handle non-equal time series data and outperform the state of arts in terms of clustering accuracy and quality.
出处 《计算机科学与探索》 CSCD 北大核心 2015年第11期1301-1313,共13页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.61103006~~
关键词 聚类 时间序列 K-MEANS算法 clustering time series k-means algorithm
  • 相关文献

参考文献16

  • 1Xie Jing.Topic detection and Tweet’s trends warning for Chinese microblog[D].Shanghai:Shanghai Jiao Tong University,2013.
  • 2Yao Haibo.Detection and trend prediction research of hot topic of micro-blogging[D].Guangzhou:South China University of Technology,2013.
  • 3Ginsberg J,Mohebbi M H,Patel R S,et al.Detecting influenza epidemics using search engine query data[J].Nature,2008,457(7232):1012-1014.
  • 4韩忠明,陈妮,乐嘉锦,段大高,孙践知.面向热点话题时间序列的有效聚类算法研究[J].计算机学报,2012,35(11):2337-2347. 被引量:31
  • 5Nikolov S.Trend or no trend:a novel nonparametric method for classifying time series[D].Massachusetts Institute of Technology,2012.
  • 6陈湘涛,李明亮,陈玉娟.基于时间序列相似性聚类的应用研究综述[J].计算机工程与设计,2010,31(3):577-581. 被引量:27
  • 7M?ller-Levet C S,Klawonn F,Cho K-H,et al.Fuzzy clustering of short time-series and unevenly distributed sampling points[C]//LNCS 2810:Proceedings of the 5th International Symposium on Intelligent Data Analysis,Berlin,Germany,Aug 28-30,2003.Berlin,Heidelberg:Springer,2003:330-340.
  • 8Yang J,Leskovec J.Patterns of temporal variation in online media[C]//Proceedings of the 4th ACM International Conference on Web Search and Data Mining,Hong Kong,China,Feb 9-12,2011.New York,NY,USA:ACM,2011.
  • 9Liao T W.Clustering of time series data—a survey[J].Pattern Recognition,2005,38(11):1857-1874.
  • 10Liao T.Understanding and projecting the battle state[C]//Proceedings of the 23rd Army Science Conference,Orlando,USA,2002.

二级参考文献39

共引文献1131

同被引文献83

引证文献10

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部