摘要
Hadoop、spark等软件框架为大数据的并行快速处理提供了技术支持,同时大数据环境也对OLAP提出了准实时和实时响应的要求。数据立方体是OLAP的多维数据模型抽象,大数据的多变性分析使数据立方体呈现高维特点,大数据的数据量也造成了数据立方体的膨胀。利用有向图描述数据立方体,可以为数据分析提供数据片和数据块的全集,通过提取全集中的某个元素,提高数据分析的效率。对高维度的数据立方体,采用降低维度的办法进行立方体规模的控制。根据各个维度的使用频度和方式,提出了可无维度、必须维度和联合维度的概念,并分别给出了各种维度的判断方法,实现了所涉及的数据立方体的调整简化方法。
Hadoop,spark and other software frameworks provide technical support for parallel and fast processing of big data.At the same time,the big data environment also puts forward the requirements of quasi-realtime and real-time response to OLAP.Data cube is the abstraction of multidimensional data model for OLAP.The variability analysis of large data makes the data cube high dimensional features,the amount of large data also causes the expansion of the data cube.A digraph is used to describe a data cube,which can provide a complete set of data pieces and data blocks for data analysis,and improve the efficiency of data analysis by extracting an element in the complete set.For a high dimension data cube,a dimension reduction method is used to control the size of the cube.Based the frequency and mode of using each dimension,the concepts of non-dimension,necessary dimension,and joint dimensions are proposed,the methods of judging all kinds of dimensions are given,and a simplified method of adjusting the data cube is implemented.
作者
张岩
吕梦儒
ZHANG Yan;LYU Mengru(Computer and Basic Mathematics Education Department,Shenyang Normal University,Shenyang 110034,China)
出处
《沈阳师范大学学报(自然科学版)》
CAS
2018年第1期77-81,共5页
Journal of Shenyang Normal University:Natural Science Edition
基金
辽宁省自然科学基金资助项目(2015020055)