摘要
流立方体计算是流式数据多维分析的重要基础,然而流式数据的动态性、无限性、突发性等特征使其面临巨大的挑战.在实际应用中,用户的兴趣通常集中在部分视图上,基于这个特点提出了一种基于兴趣视图子集的计算方法,依据用户历史查询信息确定兴趣视图子集与兴趣路径,同时定义了Stream-Tree结构用于在主存中物化存储兴趣视图子集所包含的数据单元,在运行过程中依据多层次时间窗口约束不断更新和维护Stream-Tree中存储的数据单元,而对于稀疏数据单元仅保留高层次的聚集值.实验和分析表明,该方法能够在有限的主存空间中维持流立方体当前窗口内的数据单元,同时能够支持快速更新维护存储结构和响应用户查询.
Stream cube computing is the important foundation of data stream multidimensional analysis. But the features of data stream (dynamic, infinity, bursty, etc) and complexity of multidimensional data structure, are confronted with great challenges, such as storage space, updating efficiency, adaptability, and so on. In many applications, users often focus on only a portion of views. A computing method based on interesting view subset is proposed in this paper. Interesting view subset and interesting path can be obtained by the information of historical queries. And if the efficiency of answering queries decreases, it should be updated with the lapse of time. The Stream-Tree structure is defined for maintaining the cells of interesting view subset and drilling paths in memory. In the running phase, the cells of Stream-Tree are continuously updated with new tuple arriving, and the old cells are deleted periodically according to the constraints of multi-level time windows. The sparse cells of Stream-Tree will not be divided into finer ones, only the high level aggregations are preserved. Experiments and analysis results indicate that the method is efficient in maintaining the stream cube cells of current time window in finite memory, and can answer the queries of users quickly.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2011年第12期2369-2378,共10页
Journal of Computer Research and Development
基金
国家自然科学基金项目(70771110)
关键词
流式数据
流立方体
多维分析
兴趣视图子集
多层次时间窗口
data stream
stream cube
multidimensional analysis
interesting view subset
multi-level time windows