基于LSH的时间子序列查询算法被引量：6

Similarity Query of Time Series Sub-sequences Based on LSH

下载PDF

导出

摘要子序列的相似性查询是时间序列数据集中的一种重要操作,包括范围查询和k近邻查询.现有的大多算法是基于欧几里德距离或者DTW距离的,缺点在于查询效率低下.文中提出了一种新的基于LSH的距离度量方法,可以在保证查询结果质量的前提下,极大提高相似性查询的效率;在此基础上,给出一种DS-Index索引结构,利用距离下界进行剪枝,进而还提出了两种优化的OLSH-Range和OLSH-kNN算法.实验是在真实的股票序列集上进行的,数据结果表明算法能快速精确地找出相似性查询结果. Subsequence Similarity Query is an important operation in time series, including range query and k nearest neighbor query. Most of these algorithms are based on Euclidean distance or DTW distance, weak point of which is the time inefficiencies. We propose a new distance meas- ure, based on Locality Sensitive Hash （LSH）, which improve the efficiency greatly while ensu- ring the quality of the query results. We also propose an index structure named DS-Index. Using DS-Index, we prune the candidates of query and thus propose two optimal algorithms： OLSH- Range and OLSH-kNN. Our experiments conducted on real stock exchange transaction sequence datasets show that algorithms can quickly and accurately find similarity query results.

作者汤春蕾董家麒

机构地区复旦大学计算机科学技术学院

出处《计算机学报》 EI CSCD 北大核心 2012年第11期2228-2236,共9页 Chinese Journal of Computers

基金上海市重点学科建设基金(B114)资助~~

关键词相似性查询时间序列数据库子序列 LSH 索引 similarity query time-series databases subsequence Locality Sensitive Hash （LSH） index

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献20

1Keogh E. Exact indexing of dynamic time warping//Proceed- ings of the VLDB. Hong Kong, China, 2002: 406-417.
2Rafiei D, Mendelzon A O. Querying time series data based on similarity. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(5): 675-693.
3Berndt D, Clifford J. Finding patterns in time series: A dynamic programming approach//Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence. Menlo Park, CA, USA, 1996:229-248.
4Sakoe H, Chiba S. Dynamic programming algorithm optimi- zation for spoken word recognition. IEEE Transactions on ASSP, 1978, 26(1): 43-49.
5Vlachos M, Gunopulos D, Kollios G. Discovering similar multi-dimensional trajectories//Proceedings of the ICDE. San Jose, CA, USA, 2002:673-684.
6Chen L, Ozsu M T, Oria V. Robust and fast similarity search for moving object trajectories//Proceedings of the 2005 ACM SIGMOD International Conference on Manage- ment of Data. New York, USA, 2005: 491-502.
7Chen L, Ng R T. On the marriage of Lp-norms and edit dis- tance//Proceedings of the 30th International Conference on Very Large Data Bases. 2004:792-803.
8Agrawal R, Faloutsos C, Swami A. Efficient similarity search in sequence databases//Proceedings of the FODO. Chicago, Illinois, USA, 1993:69-84.
9Beckmann Net al. The R.tree: An efficient and robust access method for points and rectangles//Proceedings of the SIGMOD. Atlantic City, NJ, USA, 1990:322-331.
10Keogh E et al. Locally adaptive dimensionality reduction for indexing large time series databases//Proceedings of the SIGMOD. Santa Barbara, CA, USA, 2001:151-162.

同被引文献83

1倪世宏,史忠科,王彦鸿,谢川.军用战机驾驶员操纵品质评估系统研究[J].空军工程大学学报（自然科学版）,2004,5(6):7-10. 被引量：11
2钱能,金文东.DNA序列比对分析中的统计特征方法[J].浙江工业大学学报,2005,33(2):173-175. 被引量：4
3倪世宏 ,史忠科 ,谢川 ,王彦鸿 .军用战机机动飞行动作识别知识库的建立[J].计算机仿真,2005,22(4):23-26. 被引量：36
4王国仁,葛健,徐恒宇,郑若石.基于二分频率变换的序列相似性查询处理技术[J].软件学报,2006,17(2):232-241. 被引量：8
5张宝华,王海水,许禄.DNA序列编码及相似度计算[J].高等学校化学学报,2006,27(12):2277-2280. 被引量：9
6朱扬勇,熊赟.DNA序列数据挖掘技术[J].软件学报,2007,18(11):2766-2781. 被引量：37
7P Indyk, R Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality[C]. Proceedings of the thirtieth annual ACM symposium on theory of computing. ACM, 1998:604-613.
8A Gionis, P Indyk, R Motwani. Similarity search dimensions via hashing[J]. VLDB, 1999:518-529.
9M Datar, N Immorlica, P Indyk, et al. Locality-sensitive hashing scheme based on p-stable distributions[C] Proceedings of the twentieth annual symposium on computational geometry. ACM, 2004:253-262.
10Q Lv, W Josephson, Z Wang, et al. Multi-probe LSH efficient indexing for high-dimensional similarity search[C]. Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, 2007: 950-961.

引证文献6

1廖丽,伍绍佳.优化多重过滤的序列查询算法研究[J].网络安全技术与应用,2014(6):104-104. 被引量：2
2刘根平.集中式环境下的局部敏感哈希算法综述[J].移动通信,2015,39(10):46-51. 被引量：1
3刘根平,陈叶芳,杜呈透,钱江波.一种基于LSH的时间子序列匹配查询算法[J].电信科学,2015,31(8):63-71. 被引量：1
4于喆.水生生物DNA序列相似度的算法[J].水产学杂志,2016,29(5):22-26. 被引量：1
5沈一超,倪世宏,张鹏.一种飞行数据相似子序列查询方法[J].空军工程大学学报（自然科学版）,2019,20(2):7-12. 被引量：1
6李敏,于长永,张峰,马海涛,赵宇海.基于LSH的时间序列DTW相似性查询[J].小型微型计算机系统,2019,40(10):2155-2159. 被引量：5

二级引证文献11

1于喆.水生生物DNA序列相似度的算法[J].水产学杂志,2016,29(5):22-26. 被引量：1
2吴法民,吕广奕,刘淇,何明,常标,何伟栋,钟辉,张乐.视频实时评论的深度语义表征方法[J].计算机研究与发展,2019,56(2):293-305. 被引量：6
3李敏,于长永,张峰,马海涛,赵宇海.基于LSH的时间序列DTW相似性查询[J].小型微型计算机系统,2019,40(10):2155-2159. 被引量：5
4孙超,孟庆民,王力,姚吉进,宗宝良,郭永新,焦青.基于Java Web的DSA信息管理与图像分析系统的研制[J].中国医疗器械杂志,2019,43(5):348-351. 被引量：1
5张晓黎.财经院校《Java数据科学》课程的思政研究[J].电脑知识与技术,2020,16(34):180-182. 被引量：5
6费超,陆天海,于海涛,徐大诚.微悬臂梁气敏材料表征系统中基线校正方法[J].现代电子技术,2021,44(17):100-104.
7魏政磊,丁达理,黄康强,黄长强.基于时序分析的近距空战数据知识提取及应用[J].指挥与控制学报,2022,8(1):80-89. 被引量：4
8魏联滨,王彬,王莹,张海峰.基于气象相似日选取与提升回归树的光伏发电短期功率预测[J].电子器件,2022,45(1):183-188. 被引量：3
9于喆.水产种质数字档案管理平台建设的研究[J].信息记录材料,2022,23(6):118-120.
10张晓黎.信息技术企业股票关联网络风险[J].系统工程学报,2023,38(6):812-823.

1余静,麦绍辉,刘立东.电力系统EMS数据备份方案[J].电力系统自动化,2009,33(17):101-104. 被引量：7
2熊莉娟,宋宏艳,张惠敏,尚海燕,原军,王波.时间序列数据库在配电网自动化系统中的应用[J].信息与电脑（理论版）,2013,0(7):176-178.
3黄军高,王首顶,凌强,陈伟,杨斌.时间序列数据库在地区电网调控一体化系统中的应用[J].电力系统自动化,2011,35(23):107-111. 被引量：32
4李重文,邓腾彬,马世龙.基于分段极值的时间序列数据查询显示方法[J].计算机工程,2014,40(9):27-31. 被引量：4
5徐香坤,徐化冰.基于太阳黑子的BP网络的研究[J].数码世界,2016,0(6):50-50.
6Mark Last（M．拉斯特）（编）,胡海伦.时间序列数据库中的数据挖掘[J].国外科技新书评介,2005(7):17-18.
7李爱国,覃征.大规模时间序列数据库降维及相似搜索[J].计算机学报,2005,28(9):1467-1475. 被引量：20
8刘晓影,周一民.一种快速的子序列匹配算法[J].小型微型计算机系统,2008,29(6):1099-1103. 被引量：2
9杭献东,王宇.中央银行经济金融时间序列数据库系统的开发[J].中国金融电脑,1997,0(11):31-34.
10张浩,刘志镜.加权DTW距离的自动步态识别[J].中国图象图形学报,2010,15(5):830-836. 被引量：16

计算机学报

2012年第11期

浏览历史

内容加载中请稍等...

基于LSH的时间子序列查询算法被引量：6

参考文献20

同被引文献83

引证文献6

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于LSH的时间子序列查询算法 被引量：6

参考文献20

同被引文献83

引证文献6

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于LSH的时间子序列查询算法被引量：6