摘要
随着各个领域数据量的与日俱增,数据仓库技术在进行海量数据资源的管理过程中,数据断层现象已经成为亟待解决的一个重要问题。断层的概念来源于地质学上对于由储层非均质性而引起的岩石断裂且两侧发生明显位移的构造描述,对能源开采、地震预防等问题具有重大的现实意义。借鉴地质断层的理论,引入数据断层的系列概念定义数据与数据之间发生局部位移的趋势,首次从宏观和微观两方面对数据仓库中的各种数据非均质现象进行知识描述,通过数据断层剖面的分析,系统地阐述数据预处理过程中的数据断层现象,给出数据断层在显隐断层、内间断层之间相互转化的规则和算法,初步形成了数据断层理论体系的基础,并通过实验验证了该理论的有效性。
With the increasing growth of data amount in each field, in management process of massive data resources, data faultage has become an urgent problem to be solved in data warehouse technology. The concept of faultage comes from the description of a structure in geology that the rock cracks with obvious displacement on both sides caused by the reservoir inhomogeneity, which has significant realistic significance on energy exploration, earthquake prevention and so on. Inspired by the theory of geological faultage, in this paper we introduce a series of data faultage concept to define the tendency of partial displacement between the data, and at the first time give the knowledge description on various data inhomogeneity phenomena in data warehouse from both micro and macro viewpoints. Through analysing the sections of data faultage, we systematically elaborate the data faultage phenomenon in the process of data pretreatment, present the rules and algorithms of mutual transformation of data faultage between the explicit and the implicit faults, as well as the inner and the inter faults, and initially lay the foundation of the data faultage theory system. The effectiveness of the theory has been validated by the experiments.
出处
《计算机应用与软件》
CSCD
北大核心
2013年第8期9-13,77,共6页
Computer Applications and Software
基金
国家自然科学基金项目(40976108)
上海市重点学科建设项目(J50103)
上海大学研究生创新基金项目(SHUCX070037,SHUCX120105)
关键词
数据断层
非均质性
显隐断层
内间断层
Data faultage Inhomogeneity Explicit and implicit fault Inner and inter fault