摘要
挖掘时间序列中的全链集是一个新兴领域。据了解,当前并无多尺度最近时间序列的全链集挖掘算法存在。对多尺度最近时间序列下全序链集的挖掘问题进行研究,在现有LRSTOMP和ALLC算法的基础上提出了一种具有增量计算特性的挖掘算法MTSC(Mining Time Series All-Chain Sets over Multi-scale Nearest Time Series,MTSC)。该算法依次使用LRSTOMP与ALLC算法对第一个最近时间序列成员内容进行处理,得到该成员上的全序链集挖掘结果,同时保留该成员相关的PL和PR结构。从第二个最近时间序列成员开始,MTSC算法中的LRSTOMP过程只需要处理当前最近时间序列成员相对于前一个最近时间序列成员的新增部分,进一步结合前一个最近时间序列成员上的PL和PR,可以增量获得当前最近时间序列成员上的PL和PR结构,在此基础上使用ALLC算法得到该成员上的全序链集挖掘结果。相较于对每一个最近时间序列成员内容都使用LRSTOMP和ALLC算法处理的Naive方式,MTSC算法利用增量计算的思想,避免了对全部数据进行重复性计算,从而加快了算法的执行速度,具有更高的时间效率。基于公有数据样本Penguin和TiltABP的仿真实验验证了该算法的有效性,实验结果表明其性能与Naive算法完全一致,且对于以上数据样本,在空间开销增加1.1%~9.7%的情况下,可以实现时间效率80%~88.3%的提升。
Mining all-chain set in the time series is an emerging area.To the best of our knowledge,no method has been proposed to mining all-chain sets over multi-scale nearest time series.In this paper,the problem of mining all-chain sets over multi-scale nearest time series is focused.The mining problem of all-chain sets over multi-scale nearest time series is studied,and a mining algorithm with incremental computation characteristics is proposed on the basis of the existing LRSTOMP and ALLC algorithms,MTSC(mining time series all-chain sets over multi-scale nearest time series).The MTSC algorithm uses the LRSTOMP and ALLC algorithms sequentially to process the content of the 1st nearest time series member to obtain the mining results of all-chain sets over this member,while keeping the PL and PR structures associated with this member.Starting from the 2nd nearest time series member,the LRSTOMP process in the MTSC algorithm only needs to deal with the additions of the current nearest time series member with respect to the previous nearest time series member,and further combining the PL and PR on the previous nearest time series member can incrementally obtain the structure of the PL and PR on the current nearest time series member,and based on which the ALLC algorithm is used to get the all-chain set mining result on that member.Compared to the Naive way using LRSTOMP and ALLC algorithms to process the content of each recent time series member,the MTSC algorithm avoids repetitive computation on all data by utilizing the idea of incremental computation,which improves the execution speed of the algorithm and has better time efficiency.Simulation experiments based on the common data samples Penguin and TiltABP verify the effectiveness of the proposed algorithm,and the experiment results show that the results of the MTSC algorithm are completely consistent with that of the Naive algorithm,and the MTSC algorithm can achieve 80%~88.3%improvement in time efficiency for the above data samples with an increase in space overhead of 1.1%~9.7%.
作者
王少鹏
冯淳恺
WANG Shaopeng;FENG Chunkai(School of Software Engineering,Inner Mongolia University,Hohhot 010021,China;Inner Mongolia Engineering Research Center of Ecological Big Data Ministry of Education,Hohhot,010021,China;Inner Mongolia Engineering Laboratory for Cloud Computing and Service,Hohhot 010021,China;Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory,Hohhot 010021,China)
出处
《计算机科学》
CSCD
北大核心
2024年第10期247-260,共14页
Computer Science
基金
国家自然科学基金(62066034,62262047)
内蒙古科技计划基金(61862047)
内蒙古纪检监察大数据实验室开放课题基金(IMDBD2020011)。
关键词
时间序列
内容演化
时间序列链
全序链集
增量计算
Time series
Content evolution
Time series chain
All-chain set
Incremental calculation