基于符号表示的可度量shapelets提取的时序分类研究

Measurable Shapelets Extraction Based on Symbolic Rrepresentation for Time Series Classification

下载PDF

导出

摘要在时序分类问题中,基于符号表示的shapelets提取方法具有良好的分类精度和分类效率,但对符号进行质量度量的过程,如计算TFIDF分数,耗时较长且计算量大,导致分类效率较低。此外,提取的shapelets候选数量仍然较多,判别力有待提高。针对这些问题,本文提出了一种基于符号表示的可度量shapelets提取方法,该方法包含时间序列数据预处理、确定shapelets候选集和学习shapelets 3个阶段,可以快速得到高质量shapelets。在数据预处理阶段,将时间序列转化为符号聚合近似(SAX)表示以降低原始时间序列的维度。在确定shapelets候选集阶段,利用Bloom过滤器过滤重复的SAX词,并将过滤后的SAX词存储在哈希表中进行质量度量。随后,对SAX词的相似性进行判别,基于相似性和覆盖度等概念确定最终的shapelets候选集。在学习shapelets阶段,采用logistic回归模型学得真正的shapelets用于时序分类。在32个数据集上进行了大量实验,实验结果表明,所提方法的平均分类精度和平均分类效率均排名第二。与现有的基于shapelets的时序分类方法相比,该方法可以在保证精度的同时提高分类效率,并且具有良好的可解释性。 In the time series classification problems,shapelets extraction method based on symbol representation has good classification accuracy and efficiency,but the quality measurement of symbols,such as calculating TFIDF scores,is time-consuming and computatively heavy,leading to low classification efficiency.In addition,there are still a large number of shapelets candidates extracted,and the discriminating power needs to be improved.To solve these problems,this paper proposes a measurable shapelets extraction method based on symbolic representation,which includes three stages:time series data preprocessing,determining shapelets candidate set and learning shapelets,so that high-quality shapelets can be obtained quickly.In the data preprocessing stage,the time series is transformed into a symbolic aggregation approximation(SAX)representation to reduce the dimensions of the original time series.In the stage of determining the candidate set of shapelets,Bloom filters are used to filter repeated SAX words,and the filtered SAX words are stored in the hash table for quality measurement.Then,the similarity of SAX words is discriminated,and the final shapelets candidate set is determined based on the concepts of similarity and coverage.In the learning phase of shapelets,the logistic regression model is used to learn real shapelets for time series classification.In this paper,a large number of experiments are conducted on 32 datasets,and the experimental results show that the average classification accuracy and average classification efficiency of the proposed method rank second on 32 datasets.Compared with the existing time series classification methods based on shapelets,the proposed method can improve the classification efficiency while ensuring the accuracy,and has good interpretability.

作者王礼勤万源罗颖 WANG Liqin;WAN Yuan;LUO Ying(School of Science,Wuhan University of Technology,Wuhan 430070,China)

机构地区武汉理工大学理学院

出处《计算机科学》 CSCD 北大核心 2024年第8期106-116,共11页 Computer Science

基金中央高校基本科研业务费专项资金(2021III030JC)。

关键词时间序列分类 shapelet SAX表示 BLOOM过滤器 LOGISTIC回归 Time series classification Shapelet SAX means Bloom filters Logistic regression

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1原继东,王志海,韩萌,游洋.基于逻辑shapelets转换的时间序列分类算法[J].计算机学报,2015,38(7):1448-1459. 被引量：13

二级参考文献13

1Lines J, Davis L M, Hills J, Bagnall A. A shapelet transform for time series classification//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012). Beijing, China, 2012: 289- 297.
2Bagnall A, Davis L, Hills J, Lines J. Transformation based ensembles for time series classification//Proeeedings of the 2012 SIAM International Conference on Data Mining (SDM 2012). Anaheim, USA, 2012:307-318.
3Ding H, Trajcevski G, Scheuermann P, et al. Querying and mining of time series data: Experimental comparison of representations and distance measures//Proeeedings of the 34th International Conference on Very Large Data Bases (VLDB 2008). Auckland, New Zealand, 2008:1542-1552.
4Keogh E, Kasetty S. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 2003, 7(4) t 349-371.
5Ye L, Keogh E. Time series shapelets: A new primitive for data mining//Proceedings of the 15th ACM SIGKDD International Conference on Know|edge Discovery and Data Mining (KDD 2009). Paris, France, 2009:947-956.
6Ye L, Keogh E. Time series shapelets: A novel technique that allows accurate, interpretable and fast classification. Data Mining and Knowledge Discovery, 2011, 22(1-2): 149-182.
7Mueen A, Keogh E, Young N. Logical-shapelets: An expressive primitive for time series classiflcation//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011). San Diego, USA, 2011:1154-1162.
8Rakthanmanon T, Keogh E. Fast shapelets: A scalable algorithm for discovering time series shapelets//Froceedings of the 13th SIAM International Conference on Data Mining (SDM 2013). Austin, USA, 2013.. 668-676.
9Zakaria J, Mueen A, Keogh E. Clustering time series using unsupervised-shapelets//Proceedings of the 12th IEEE International Conference on Data Mining (ICDM 2012). Brussels, Belgium, 2012:785-794.
10Xing Z, Pei J, Yu P, Wang K. Extracting interpretable features for early classification on time series//Proceedings of the 11th SIAM International Conference on Data Mining (SDM 2011). Mesa, USA, 2011:247-258.

共引文献12

1宋志坤,徐立成,胡晓依,任海星,李强.基于改进型shapelets算法的动车组轴箱轴承故障诊断方法研究[J].仪器仪表学报,2021,42(2):66-74. 被引量：9
2姜作利.修改CIF术语能防止上当受骗吗?[J].对外经贸实务,2000(5):22-24.
3王志海,张伟,原继东,刘海洋.一种基于Shapelets的懒惰式时间序列分类算法[J].计算机学报,2019,42(1):29-43. 被引量：9
4闫汶和,李桂玲.基于shapelet的时间序列分类研究[J].计算机科学,2019,46(1):29-35. 被引量：13
5张振国,王超,温延龙,袁晓洁.基于相似性连接的时间序列Shapelets提取[J].计算机研究与发展,2019,56(3):594-610. 被引量：3
6张伟,王志海,原继东,郝石磊.一种时间序列鉴别性特征字典构建算法[J].软件学报,2020,31(10):3216-3237. 被引量：4
7李翔宇,李瑞兴,曾燕清.基于改进核函数的支持向量机时间序列数据分类[J].信阳农林学院学报,2021,31(1):121-126. 被引量：3
8许海林,林春耀,罗颖婷,黄勇,田翔,鄂盛龙.基于Shapelet识别的变压器在线DGA异常检测[J].高压电器,2021,57(7):175-181. 被引量：10
9苏耘.基于深度学习的时间序列分类方法综述[J].电子技术与软件工程,2022(14):259-262. 被引量：3
10王威娜,胡佳利,任艳.基于优化Shapelet的时间序列分类方法[J].科学技术与工程,2023,23(8):3345-3353. 被引量：2

1李猛,戴海鹏,眭永熙,顾荣,陈贵海.学习型过滤器综述[J].计算机科学,2024,51(1):41-49.
2周赣,茅欢,冯燕钧,华济民,曾瑛.基于多特征符号聚合近似和层次聚类的户变关系识别方法[J].电力系统自动化,2024,48(3):133-141.
3吴金凤.基于大概念的初中化学单元教学策略研究[J].教师,2024(15):66-68.
4吴明军.浅谈核心概念统领下的初中科学单元作业设计——以浙教版科学七年级下册第一单元为例[J].试题与研究,2024(16):159-161.
5乔永航,杨文明,陈湘源.基于Bert-base模型深度学习的液压支架动作分类研究[J].采矿技术,2024,24(4):286-290.
6吕亚.小学生量感培养现状及提升路径[J].数学大世界（上旬）,2024(2):68-70.
7赵艺臻,周立婵,杨雨晴,赵建军.基于动态滑动窗口的加权深度森林算法[J].计算机技术与发展,2024,34(8):9-16.
8刘庆,黄明浩,LEE Woon-Seek.基于时间序列和改进随机森林算法的混凝土价格趋势预测[J].运筹与管理,2024,33(6):132-138.
9施鑫垚,王静宇,刘立新.物联网环境下分布式的隐私保护数据聚合方案[J].小型微型计算机系统,2024,45(8):2026-2033.
10杨佩红.数字教材在初中语文教学中的实践探究[J].课堂内外（高中版）,2024(26):25-27.

计算机科学

2024年第8期

浏览历史

内容加载中请稍等...

基于符号表示的可度量shapelets提取的时序分类研究

参考文献1

二级参考文献13

共引文献12

相关作者

相关机构

相关主题

浏览历史