期刊文献+

面向时间序列大数据海量并行贝叶斯因子化分析方法 被引量:9

A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data
下载PDF
导出
摘要 时间序列大数据记录着复杂系统在时间和空间上大尺度的演化过程,详细描述了系统不同部分之间的相互作用和相互联系.提取时间序列大数据中潜在的低维因子对研究复杂系统的整体机制有着至关重要的作用.大数据的超高维和大尺度导致许多传统因子分析方法难以适应,先验知识缺乏更增加了研究难度.针对这一巨大挑战,提出了一种面向时间序列大数据的海量并行贝叶斯因子化分析方法(the massively parallel Bayesian factorization approach, G -BF).在缺失先验知识的情况下,通过贝叶斯算法导出因子矩阵,将算法映射至CUDA(compute unified device architecture)模型,以大规模并行的方式更新因子矩阵.该方法支持对任意维度张量的因子分解.实验结果表明:1)与通过GPU加速化的因子分解算法G -HALS(GPU-hierarchical alternative least square)相比,G -BF具有更好的运行性能,且随着数据规模的增加,其性能优越性更加明显;2)G -BF在数据处理规模、秩及维度方面都具有良好的可扩展性;3)将G -BF应用于现有子因子融合框架(hierarchical-parallel factor analysis, H-PARAFAC),可将“巨型”张量作为一个整体进行因子化分解(在2个节点上处理10 11 个数据元素),其能力较常规方法高出2个数量级. Big time series data record the evolvement of a complex system(s) in large temporal and spatial scales with great details of the interactions amongst different parts of the system. Extracting the latent low-dimensional factors plays a crucial role in examining the overall mechanism of the underlying complex system(s). Research challenges arise with the lack of a priori knowledge, and most conventional factorization methods are not able to adapt to the ultra-high dimension and scales of the big data. Aiming at the grand challenge, this study develops a massively parallel Bayesian approach ( G -BF ) to factorization-based analysis of tensors formed by massive time series. The approach relies on a Bayesian algorithm to derive the factor matrices in the absence of a priori information. Then the algorithm has been mapped to the compute unified device architecture (CUDA) model to update the factor matrices in a massively parallel manner. The proposed approach is designed to support factorization of tensors of arbitrary dimensions. Experimental results indicated that 1) In comparison with GPU-hierarchical alternative least square ( G -HALS ), G -BF exhibits much better runtime performance and the superiority becomes more obvious with the increasing data scale;2)G -BF has excellent scalability in terms of both data volume and rank;3)Applying G -BF to the existing framework for fusing sub-factors (hierarchical-parallel factor analysis,H-PARAFAC), it becomes possible to factorize a huge tensor (volume up to 10 11 over two nodes) as a whole with the capability two magnitudes higher than conventional methods.
作者 高腾飞 刘勇琰 汤云波 张垒 陈丹 Gao Tengfei;Liu Yongyan;Tang Yunbo;Zhang Lei;Chen Dan(School of Computer Science,Wuhan University,Wuhan 430072)
出处 《计算机研究与发展》 EI CSCD 北大核心 2019年第7期1567-1577,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61772380) 湖北省自然科学基金创新群体项目(2017CFA007)~~
关键词 贝叶斯模型 时间序列大数据 张量分解 海量并行计算 统一计算设备架构 Bayesian model big time series data tensor factorization massively parallel computing compute unified device architecture (CUDA)
  • 相关文献

参考文献1

二级参考文献11

  • 1谷雨,郑锦辉,孙剑,徐宗本.基于独立成分分析和支持向量机的入侵检测方法[J].西安交通大学学报,2005,39(8):876-879. 被引量:7
  • 2Z.H.Zhou.J.X.Wu.W.Tang.Ensembling neural networks:Many could be better than all? Artificial Intelligence,2002,137 (1/2):239~263
  • 3Y.Liu,X.Yao.Ensemble learning via negative correlation.Neural Networks,1999,12(10):1399~1404
  • 4C.L.Blake,C.J.Merz.UCI repository of machine learning databases.http:// kdd.ics.uci.edu/databases/kddcup99/kddcup99.html,1998
  • 5E.Biermann,E.Cloete,L.M.Venter.A comparison of intrusion detection systems.Computers and Security,2001,20 (8):676~683
  • 6T.Verwoerd,R.Hunt.Intrusion detection techniques and approaches.Computer Communications,2002,25 (15):1356 ~1365
  • 7D.Joo,T.Hong,I.Han.The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors.Expert Systems with Applications,2003,25 (1):69~ 75
  • 8A.Hyv(a)rinen,E.Oja.Independent component analysis:Algorithms and applications.Neural Networks,2000,13 (4-5):411 ~430
  • 9V.N.Vapnik.The Nature of Statistical Learning Theory.Berlin:Springer-Verlag,1995
  • 10周志华,陈世福.神经网络集成[J].计算机学报,2002,25(1):1-8. 被引量:245

共引文献24

同被引文献83

引证文献9

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部