摘要
人们对大数据的认识已从"3Vs"(Volume—大容量;Variety—多样性;Velocity—处理实时性)、"4Vs"("3Vs"与Value—价值)、到现今的"5Vs"("4Vs"与Veracity—真实性).在此背景下,首先分析过程工业大数据的"5Vs"特性;接下来,综述现有数据建模方法,并结合过程工业大数据特有性质(包括:多层面不规则采样性、多时空时间序列性、不真实数据混杂性)论述现有数据建模方法应用于工业大数据建模时的局限;最后,探讨过程工业大数据建模有待研究的问题,包括:1)多层面不规则采样数据的潜结构建模;2)用于事件发现、决策和因果分析的多时空时间序列数据建模;3)含有不真实数据的鲁棒建模;4)支持实时建模的大容量数据计算架构与方法.
The understanding of big data goes through three stages, i.e., "3Vs"(Volume, variety and velocity), "4Vs"("3Vs" and value), and "5Vs"("4Vs" and veracity). In the era of big data of process industries, the "5Vs" characteristics of industrial big data are analyzed. After that, the existing methods on data modeling are reviewed while the corresponding limitations are analyzed under industrial big data circumstances with specific characteristics, i.e., multi-layer irregularly sampling, multiple temporal and spatial time series, and non-veracity with outlier. Finally, the perspectives on industrial big data modeling are discussed, including: i) latent structure modeling of multi-layer irregularly sampled big data; ii)multiple temporal and spatial time-series data modeling for event discovery, decision-making, and causality analysis; iii)robust modeling of data with non-veracity samples; and iv) data-friendly system architecture and method towards big data real-time modeling.
出处
《自动化学报》
EI
CSCD
北大核心
2016年第2期161-171,共11页
Acta Automatica Sinica
基金
国家自然科学基金(61304107
61490704
61573022
61290323
61203102)
中国博士后科学基金(2013M541242)
博士后国际交流计划派出项目(20130020)
中央高校基本科研业务费(N130408002
N130108001)资助~~
关键词
过程工业大数据
多层面数据潜结构建模
多时空时间序列数据建模
大数据计算架构
Process industrial big data
multi-layer data latent structure modeling
multiple temporal and spatial timeseries data modeling
big data computing framework