Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, ...Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.展开更多
For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use the...For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use them to speed up the computation of sparse data cubes. A new algorithm CFD (Computation by Functional Dependencies) is presented to satisfy this demand. CFD determines the order of dimensions by considering cardinalities of dimensions and functional dependencies between dimensions together, thus reduce the number of partitions for such dimensions. CFD also combines partitioning from bottom to up and aggregate computation from top to bottom to speed up the computation further. CFD can efficiently compute a data cube with hierarchies in a dimension from the smallest granularity to the coarsest one. Key words sparse data cube - functional dependency - dimension - partition - CFD CLC number TP 311 Foundation item: Supported by the E-Government Project of the Ministry of Science and Technology of China (2001BA110B01)Biography: Feng Yu-cai (1945-), male, Professor, research direction: database system.展开更多
This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube const...This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube construction is proposed, which includes Web data modeling based on MIX ( Metadam based Integration model for data X-change ), generic and specific mapping rules design, and a transformation algorithm for mapping Web data to a multidimensional array. Besides, the structure and implementation of the prototype of a Web data base cube are discussed.展开更多
The technological landscape for managing big Earth observation(EO)data ranges from global solutions on large cloud infrastructures with web-based access to self-hosted implementations.EO data cubes are a leading techn...The technological landscape for managing big Earth observation(EO)data ranges from global solutions on large cloud infrastructures with web-based access to self-hosted implementations.EO data cubes are a leading technology for facilitating big EO data analysis and can be deployed on different spatial scales:local,national,regional,or global.Several EO data cubes with a geographic focus(“local EO data cubes”)have been implemented.However,their alignment with the Digital Earth(DE)vision and the benefits and trade-offs in creating and maintaining them ought to be further examined.We investigate local EO data cubes from five perspectives(science,business and industry,government and policy,education,communities and citizens)and illustrate four examples covering three continents at different geographic scales(Swiss Data Cube,semantic EO data cube for Austria,DE Africa,Virginia Data Cube).A local EO data cube can benefit many stakeholders and players but requires several technical developments.These developments include enabling local EO data cubes based on public,global,and cloud-native EO data streaming and interoperability between local EO data cubes.We argue that blurring the dichotomy between global and local aligns with the DE vision to access the world’s knowledge and explore information about the planet.展开更多
Pressures on natural resources are increasing and a number of challenges need to be overcome to meet the needs of a growing population in a period of environmental variability.Some of these environmental issues can be...Pressures on natural resources are increasing and a number of challenges need to be overcome to meet the needs of a growing population in a period of environmental variability.Some of these environmental issues can be monitored using remotely sensed Earth Observations(EO)data that are increasingly available from a number of freely and openly accessible repositories.However,the full information potential of EO data has not been yet realized.They remain still underutilized mainly because of their complexity,increasing volume,and the lack of efficient processing capabilities.EO Data Cubes(DC)are a new paradigm aiming to realize the full potential of EO data by lowering the barriers caused by these Big data challenges and providing access to large spatio-temporal data in an analysis ready form.Systematic and regular provision of Analysis Ready Data(ARD)will significantly reduce the burden on EO data users.Nevertheless,ARD are not commonly produced by data providers and therefore getting uniform and consistent ARD remains a challenging task.This paper presents an approach to enable rapid data access and pre-processing to generate ARD using interoperable services chains.The approach has been tested and validated generating Landsat ARD while building the Swiss Data Cube.展开更多
Avoiding,reducing,and reversing land degradation and restoring degraded land is an urgent priority to protect the biodiversity and ecosystem services that are vital to life on Earth.To halt and reverse the current tre...Avoiding,reducing,and reversing land degradation and restoring degraded land is an urgent priority to protect the biodiversity and ecosystem services that are vital to life on Earth.To halt and reverse the current trends in land degradation,there is an immediate need to enhance national capacities to undertake quantitative assessments and mapping of their degraded lands,as required by the Sustainable Development Goals(SDGs),in particular,the SDG indicator 15.3.1(“proportion of land that is degraded over total land area”).Earth Observations(EO)can play an important role both for generating this indicator as well as complementing or enhancing national official data sources.Implementations like Trends.Earth to monitor land degradation in accordance with the SDG15.3.1 rely on default datasets of coarse spatial resolution provided by MODIS or AVHRR.Consequently,there is a need to develop methodologies to benefit from medium to high-resolution satellite EO data(e.g.Landsat or Sentinels).In response to this issue,this paper presents an initial overview of an innovative approach to monitor land degradation at the national scale in compliance with the SDG15.3.1 indicator using Landsat observations using a data cube but further work is required to improve the calculation of the three sub-indicators.展开更多
QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved...QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.展开更多
The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to ...The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.展开更多
The effort and cost required to convert satellite Earth Observation(EO)data into meaningful geophysical variables has prevented the systematic analysis of all available observations.To overcome these problems,we utili...The effort and cost required to convert satellite Earth Observation(EO)data into meaningful geophysical variables has prevented the systematic analysis of all available observations.To overcome these problems,we utilise an integrated High Performance Computing and Data environment to rapidly process,restructure and analyse the Australian Landsat data archive.In this approach,the EO data are assigned to a common grid framework that spans the full geospatial and temporal extent of the observations–the EO Data Cube.This approach is pixel-based and incorporates geometric and spectral calibration and quality assurance of each Earth surface reflectance measurement.We demonstrate the utility of the approach with rapid time-series mapping of surface water across the entire Australian continent using 27 years of continuous,25 m resolution observations.Our preliminary analysis of the Landsat archive shows how the EO Data Cube can effectively liberate high-resolution EO data from their complex sensor-specific data structures and revolutionise our ability to measure environmental change.展开更多
Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed witho...Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.展开更多
We propose a robust watermarking scheme and several extensions for digital right management of data cubes.The ownership information is hidden into a data cube by modifying a set of selected cell values.Its owner can u...We propose a robust watermarking scheme and several extensions for digital right management of data cubes.The ownership information is hidden into a data cube by modifying a set of selected cell values.Its owner can use his private key to control all the watermarking parameters freely.Neither original data cube nor the watermark is required in watermark detection.Detailed analysis and extensive experiments are conducted for the proposed schemes in terms of watermark detectability,robustness and efficiency.Our results show that the scheme performs well in actual applications.展开更多
Modeling plays an important role for the solution of the complex research problems. When the database became large and complex then it is necessary to create a unified model for getting the desired information in the ...Modeling plays an important role for the solution of the complex research problems. When the database became large and complex then it is necessary to create a unified model for getting the desired information in the minimum time and to implement the model in a better way. The present paper deals with the modeling for searching of the desired information from a large database by storing the data inside the three dimensional data cubes. A sample case study is considered as a real data related to the ground water and municipal water supply, which contains the data from the various localities of a city. For the demonstration purpose, a sample size is taken as nine but when it becomes very large for number of localities of different cities then it is necessary to store the data inside data cubes. A well known object-oriented Unified Modeling Language (UML) is used to create Unified class and state models. For verification purpose, sample queries are also performed and corresponding results are depicted.展开更多
I/O parallelism is considered to be a promising approach to achieving highperformance in parallel data warehousing systems where huge amounts of data and complex analyticalqueries have to be processed. This paper prop...I/O parallelism is considered to be a promising approach to achieving highperformance in parallel data warehousing systems where huge amounts of data and complex analyticalqueries have to be processed. This paper proposes a parallel secondary data cube storage structure(PHC for short) to efficiently support the processing of range sum queries and dynamic updates ondata cube using parallel computing systems. Based on PHC, two parallel algorithms for processingrange sum queries and updates are proposed also. Both the algorithms have the same time complexity,O(log^d n/P). The analytical and experimental results show that PHC and the parallel algorithms havehigh performance and achieve optimum speedup.展开更多
Many approaches have been proposed to pre-compute data cubes in order to efficiently respond to OLAP queries in data warehouses. However, few have proposed solutions integrating all of the possible outcomes, and it is...Many approaches have been proposed to pre-compute data cubes in order to efficiently respond to OLAP queries in data warehouses. However, few have proposed solutions integrating all of the possible outcomes, and it is this idea that leads the integration of hierarchical dimensions into these responses. To meet this need, we propose, in this paper, a complete redefinition of the framework and the formal definition of traditional database analysis through the prism of hierarchical dimensions. After characterizing the hierarchical data cube lattice, we introduce the hierarchical data cube and its most concise reduced representation, the closed hierarchical data cube. It offers compact replication so as to optimize storage space by removing redundancies of strongly correlated data. Such data are typical of data warehouses, and in particular in video games, our field of study and experimentation, where hierarchical dimension attributes are widely represented.展开更多
文摘Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.
文摘For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use them to speed up the computation of sparse data cubes. A new algorithm CFD (Computation by Functional Dependencies) is presented to satisfy this demand. CFD determines the order of dimensions by considering cardinalities of dimensions and functional dependencies between dimensions together, thus reduce the number of partitions for such dimensions. CFD also combines partitioning from bottom to up and aggregate computation from top to bottom to speed up the computation further. CFD can efficiently compute a data cube with hierarchies in a dimension from the smallest granularity to the coarsest one. Key words sparse data cube - functional dependency - dimension - partition - CFD CLC number TP 311 Foundation item: Supported by the E-Government Project of the Ministry of Science and Technology of China (2001BA110B01)Biography: Feng Yu-cai (1945-), male, Professor, research direction: database system.
基金The National Natural Science Foundation of China (No.60573165)
文摘This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube construction is proposed, which includes Web data modeling based on MIX ( Metadam based Integration model for data X-change ), generic and specific mapping rules design, and a transformation algorithm for mapping Web data to a multidimensional array. Besides, the structure and implementation of the prototype of a Web data base cube are discussed.
基金the Austrian Research Promotion Agency(FFG)under the Austrian Space Application Programme(ASAP)within the projects Sen2Cube.at(project no.:866016)SemantiX(project no.:878939)SIMS(project no.:885365).
文摘The technological landscape for managing big Earth observation(EO)data ranges from global solutions on large cloud infrastructures with web-based access to self-hosted implementations.EO data cubes are a leading technology for facilitating big EO data analysis and can be deployed on different spatial scales:local,national,regional,or global.Several EO data cubes with a geographic focus(“local EO data cubes”)have been implemented.However,their alignment with the Digital Earth(DE)vision and the benefits and trade-offs in creating and maintaining them ought to be further examined.We investigate local EO data cubes from five perspectives(science,business and industry,government and policy,education,communities and citizens)and illustrate four examples covering three continents at different geographic scales(Swiss Data Cube,semantic EO data cube for Austria,DE Africa,Virginia Data Cube).A local EO data cube can benefit many stakeholders and players but requires several technical developments.These developments include enabling local EO data cubes based on public,global,and cloud-native EO data streaming and interoperability between local EO data cubes.We argue that blurring the dichotomy between global and local aligns with the DE vision to access the world’s knowledge and explore information about the planet.
基金The authors would like to thank the Swiss Federal Office for the Environment(FOEN)for their financial support to the Swiss Data Cube.
文摘Pressures on natural resources are increasing and a number of challenges need to be overcome to meet the needs of a growing population in a period of environmental variability.Some of these environmental issues can be monitored using remotely sensed Earth Observations(EO)data that are increasingly available from a number of freely and openly accessible repositories.However,the full information potential of EO data has not been yet realized.They remain still underutilized mainly because of their complexity,increasing volume,and the lack of efficient processing capabilities.EO Data Cubes(DC)are a new paradigm aiming to realize the full potential of EO data by lowering the barriers caused by these Big data challenges and providing access to large spatio-temporal data in an analysis ready form.Systematic and regular provision of Analysis Ready Data(ARD)will significantly reduce the burden on EO data users.Nevertheless,ARD are not commonly produced by data providers and therefore getting uniform and consistent ARD remains a challenging task.This paper presents an approach to enable rapid data access and pre-processing to generate ARD using interoperable services chains.The approach has been tested and validated generating Landsat ARD while building the Swiss Data Cube.
基金This research was funded by the European Commission“Horizon 2020 Program”ERA-PLANET/GEOEssential project,grant number 689443.
文摘Avoiding,reducing,and reversing land degradation and restoring degraded land is an urgent priority to protect the biodiversity and ecosystem services that are vital to life on Earth.To halt and reverse the current trends in land degradation,there is an immediate need to enhance national capacities to undertake quantitative assessments and mapping of their degraded lands,as required by the Sustainable Development Goals(SDGs),in particular,the SDG indicator 15.3.1(“proportion of land that is degraded over total land area”).Earth Observations(EO)can play an important role both for generating this indicator as well as complementing or enhancing national official data sources.Implementations like Trends.Earth to monitor land degradation in accordance with the SDG15.3.1 rely on default datasets of coarse spatial resolution provided by MODIS or AVHRR.Consequently,there is a need to develop methodologies to benefit from medium to high-resolution satellite EO data(e.g.Landsat or Sentinels).In response to this issue,this paper presents an initial overview of an innovative approach to monitor land degradation at the national scale in compliance with the SDG15.3.1 indicator using Landsat observations using a data cube but further work is required to improve the calculation of the three sub-indicators.
基金Supported by the National Key Scientific and Technological Project: Research on the Management of the Railroad Fundamental Information (Grant No.2002BA407B01-2) and the Science Foundation of Beijing Jiaotong University (Grant No.2003SZ003).
文摘QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.
文摘The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.
文摘The effort and cost required to convert satellite Earth Observation(EO)data into meaningful geophysical variables has prevented the systematic analysis of all available observations.To overcome these problems,we utilise an integrated High Performance Computing and Data environment to rapidly process,restructure and analyse the Australian Landsat data archive.In this approach,the EO data are assigned to a common grid framework that spans the full geospatial and temporal extent of the observations–the EO Data Cube.This approach is pixel-based and incorporates geometric and spectral calibration and quality assurance of each Earth surface reflectance measurement.We demonstrate the utility of the approach with rapid time-series mapping of surface water across the entire Australian continent using 27 years of continuous,25 m resolution observations.Our preliminary analysis of the Landsat archive shows how the EO Data Cube can effectively liberate high-resolution EO data from their complex sensor-specific data structures and revolutionise our ability to measure environmental change.
基金supported in part by a grant from HP Labs China,the National Natural Science Foundation of China under GrantNo.60496325the Main Memory OLAP Servers Project
文摘Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.
基金the National Natural Science Foundation of China(No.60703032)the National High Technology Research and Development Program(863)of China(No.2007AA01Z456)
文摘We propose a robust watermarking scheme and several extensions for digital right management of data cubes.The ownership information is hidden into a data cube by modifying a set of selected cell values.Its owner can use his private key to control all the watermarking parameters freely.Neither original data cube nor the watermark is required in watermark detection.Detailed analysis and extensive experiments are conducted for the proposed schemes in terms of watermark detectability,robustness and efficiency.Our results show that the scheme performs well in actual applications.
文摘Modeling plays an important role for the solution of the complex research problems. When the database became large and complex then it is necessary to create a unified model for getting the desired information in the minimum time and to implement the model in a better way. The present paper deals with the modeling for searching of the desired information from a large database by storing the data inside the three dimensional data cubes. A sample case study is considered as a real data related to the ground water and municipal water supply, which contains the data from the various localities of a city. For the demonstration purpose, a sample size is taken as nine but when it becomes very large for number of localities of different cities then it is necessary to store the data inside data cubes. A well known object-oriented Unified Modeling Language (UML) is used to create Unified class and state models. For verification purpose, sample queries are also performed and corresponding results are depicted.
文摘I/O parallelism is considered to be a promising approach to achieving highperformance in parallel data warehousing systems where huge amounts of data and complex analyticalqueries have to be processed. This paper proposes a parallel secondary data cube storage structure(PHC for short) to efficiently support the processing of range sum queries and dynamic updates ondata cube using parallel computing systems. Based on PHC, two parallel algorithms for processingrange sum queries and updates are proposed also. Both the algorithms have the same time complexity,O(log^d n/P). The analytical and experimental results show that PHC and the parallel algorithms havehigh performance and achieve optimum speedup.
文摘Many approaches have been proposed to pre-compute data cubes in order to efficiently respond to OLAP queries in data warehouses. However, few have proposed solutions integrating all of the possible outcomes, and it is this idea that leads the integration of hierarchical dimensions into these responses. To meet this need, we propose, in this paper, a complete redefinition of the framework and the formal definition of traditional database analysis through the prism of hierarchical dimensions. After characterizing the hierarchical data cube lattice, we introduce the hierarchical data cube and its most concise reduced representation, the closed hierarchical data cube. It offers compact replication so as to optimize storage space by removing redundancies of strongly correlated data. Such data are typical of data warehouses, and in particular in video games, our field of study and experimentation, where hierarchical dimension attributes are widely represented.