Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, ...Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.展开更多
For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use the...For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use them to speed up the computation of sparse data cubes. A new algorithm CFD (Computation by Functional Dependencies) is presented to satisfy this demand. CFD determines the order of dimensions by considering cardinalities of dimensions and functional dependencies between dimensions together, thus reduce the number of partitions for such dimensions. CFD also combines partitioning from bottom to up and aggregate computation from top to bottom to speed up the computation further. CFD can efficiently compute a data cube with hierarchies in a dimension from the smallest granularity to the coarsest one. Key words sparse data cube - functional dependency - dimension - partition - CFD CLC number TP 311 Foundation item: Supported by the E-Government Project of the Ministry of Science and Technology of China (2001BA110B01)Biography: Feng Yu-cai (1945-), male, Professor, research direction: database system.展开更多
This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube const...This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube construction is proposed, which includes Web data modeling based on MIX ( Metadam based Integration model for data X-change ), generic and specific mapping rules design, and a transformation algorithm for mapping Web data to a multidimensional array. Besides, the structure and implementation of the prototype of a Web data base cube are discussed.展开更多
On-Line Analytical Processing (OLAP) is based on pre-computation of data cubes, which greatly reduces the response time and improves the performance of OLAP. Frag-Shells algorithm is a common method of precomputation....On-Line Analytical Processing (OLAP) is based on pre-computation of data cubes, which greatly reduces the response time and improves the performance of OLAP. Frag-Shells algorithm is a common method of precomputation.However, it relies too much on the data dispersion that it performs poorly, when confronts large amount of highly disperse data. As the amount of data grows fast nowadays, the efficiency of data cube construction is increasingly becoming a significant bottleneck. In addition, with the popularity of cloud computing and big data, MapReduce framework proposed by Google is playing an increasingly prominent role in parallel processing. It is an intuitive idea that MapReduce framework can be used to enhance the efficiency of parallel data cube construction. In this paper, by improving the Frag-Shells algorithm based on the irrelevance of data dispersion, and taking advantages of the high parallelism of MapReduce framework, we propose an improved Frag-Shells algorithm based on MapReduce framework. The simulation results prove that the proposed algorithm greatly enhances the efficiency of cube construction.展开更多
The technological landscape for managing big Earth observation(EO)data ranges from global solutions on large cloud infrastructures with web-based access to self-hosted implementations.EO data cubes are a leading techn...The technological landscape for managing big Earth observation(EO)data ranges from global solutions on large cloud infrastructures with web-based access to self-hosted implementations.EO data cubes are a leading technology for facilitating big EO data analysis and can be deployed on different spatial scales:local,national,regional,or global.Several EO data cubes with a geographic focus(“local EO data cubes”)have been implemented.However,their alignment with the Digital Earth(DE)vision and the benefits and trade-offs in creating and maintaining them ought to be further examined.We investigate local EO data cubes from five perspectives(science,business and industry,government and policy,education,communities and citizens)and illustrate four examples covering three continents at different geographic scales(Swiss Data Cube,semantic EO data cube for Austria,DE Africa,Virginia Data Cube).A local EO data cube can benefit many stakeholders and players but requires several technical developments.These developments include enabling local EO data cubes based on public,global,and cloud-native EO data streaming and interoperability between local EO data cubes.We argue that blurring the dichotomy between global and local aligns with the DE vision to access the world’s knowledge and explore information about the planet.展开更多
Pressures on natural resources are increasing and a number of challenges need to be overcome to meet the needs of a growing population in a period of environmental variability.Some of these environmental issues can be...Pressures on natural resources are increasing and a number of challenges need to be overcome to meet the needs of a growing population in a period of environmental variability.Some of these environmental issues can be monitored using remotely sensed Earth Observations(EO)data that are increasingly available from a number of freely and openly accessible repositories.However,the full information potential of EO data has not been yet realized.They remain still underutilized mainly because of their complexity,increasing volume,and the lack of efficient processing capabilities.EO Data Cubes(DC)are a new paradigm aiming to realize the full potential of EO data by lowering the barriers caused by these Big data challenges and providing access to large spatio-temporal data in an analysis ready form.Systematic and regular provision of Analysis Ready Data(ARD)will significantly reduce the burden on EO data users.Nevertheless,ARD are not commonly produced by data providers and therefore getting uniform and consistent ARD remains a challenging task.This paper presents an approach to enable rapid data access and pre-processing to generate ARD using interoperable services chains.The approach has been tested and validated generating Landsat ARD while building the Swiss Data Cube.展开更多
Avoiding,reducing,and reversing land degradation and restoring degraded land is an urgent priority to protect the biodiversity and ecosystem services that are vital to life on Earth.To halt and reverse the current tre...Avoiding,reducing,and reversing land degradation and restoring degraded land is an urgent priority to protect the biodiversity and ecosystem services that are vital to life on Earth.To halt and reverse the current trends in land degradation,there is an immediate need to enhance national capacities to undertake quantitative assessments and mapping of their degraded lands,as required by the Sustainable Development Goals(SDGs),in particular,the SDG indicator 15.3.1(“proportion of land that is degraded over total land area”).Earth Observations(EO)can play an important role both for generating this indicator as well as complementing or enhancing national official data sources.Implementations like Trends.Earth to monitor land degradation in accordance with the SDG15.3.1 rely on default datasets of coarse spatial resolution provided by MODIS or AVHRR.Consequently,there is a need to develop methodologies to benefit from medium to high-resolution satellite EO data(e.g.Landsat or Sentinels).In response to this issue,this paper presents an initial overview of an innovative approach to monitor land degradation at the national scale in compliance with the SDG15.3.1 indicator using Landsat observations using a data cube but further work is required to improve the calculation of the three sub-indicators.展开更多
QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved...QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.展开更多
The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to ...The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.展开更多
The effort and cost required to convert satellite Earth Observation(EO)data into meaningful geophysical variables has prevented the systematic analysis of all available observations.To overcome these problems,we utili...The effort and cost required to convert satellite Earth Observation(EO)data into meaningful geophysical variables has prevented the systematic analysis of all available observations.To overcome these problems,we utilise an integrated High Performance Computing and Data environment to rapidly process,restructure and analyse the Australian Landsat data archive.In this approach,the EO data are assigned to a common grid framework that spans the full geospatial and temporal extent of the observations–the EO Data Cube.This approach is pixel-based and incorporates geometric and spectral calibration and quality assurance of each Earth surface reflectance measurement.We demonstrate the utility of the approach with rapid time-series mapping of surface water across the entire Australian continent using 27 years of continuous,25 m resolution observations.Our preliminary analysis of the Landsat archive shows how the EO Data Cube can effectively liberate high-resolution EO data from their complex sensor-specific data structures and revolutionise our ability to measure environmental change.展开更多
Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed witho...Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.展开更多
We propose a robust watermarking scheme and several extensions for digital right management of data cubes.The ownership information is hidden into a data cube by modifying a set of selected cell values.Its owner can u...We propose a robust watermarking scheme and several extensions for digital right management of data cubes.The ownership information is hidden into a data cube by modifying a set of selected cell values.Its owner can use his private key to control all the watermarking parameters freely.Neither original data cube nor the watermark is required in watermark detection.Detailed analysis and extensive experiments are conducted for the proposed schemes in terms of watermark detectability,robustness and efficiency.Our results show that the scheme performs well in actual applications.展开更多
文摘Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.
文摘For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use them to speed up the computation of sparse data cubes. A new algorithm CFD (Computation by Functional Dependencies) is presented to satisfy this demand. CFD determines the order of dimensions by considering cardinalities of dimensions and functional dependencies between dimensions together, thus reduce the number of partitions for such dimensions. CFD also combines partitioning from bottom to up and aggregate computation from top to bottom to speed up the computation further. CFD can efficiently compute a data cube with hierarchies in a dimension from the smallest granularity to the coarsest one. Key words sparse data cube - functional dependency - dimension - partition - CFD CLC number TP 311 Foundation item: Supported by the E-Government Project of the Ministry of Science and Technology of China (2001BA110B01)Biography: Feng Yu-cai (1945-), male, Professor, research direction: database system.
基金The National Natural Science Foundation of China (No.60573165)
文摘This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube construction is proposed, which includes Web data modeling based on MIX ( Metadam based Integration model for data X-change ), generic and specific mapping rules design, and a transformation algorithm for mapping Web data to a multidimensional array. Besides, the structure and implementation of the prototype of a Web data base cube are discussed.
文摘On-Line Analytical Processing (OLAP) is based on pre-computation of data cubes, which greatly reduces the response time and improves the performance of OLAP. Frag-Shells algorithm is a common method of precomputation.However, it relies too much on the data dispersion that it performs poorly, when confronts large amount of highly disperse data. As the amount of data grows fast nowadays, the efficiency of data cube construction is increasingly becoming a significant bottleneck. In addition, with the popularity of cloud computing and big data, MapReduce framework proposed by Google is playing an increasingly prominent role in parallel processing. It is an intuitive idea that MapReduce framework can be used to enhance the efficiency of parallel data cube construction. In this paper, by improving the Frag-Shells algorithm based on the irrelevance of data dispersion, and taking advantages of the high parallelism of MapReduce framework, we propose an improved Frag-Shells algorithm based on MapReduce framework. The simulation results prove that the proposed algorithm greatly enhances the efficiency of cube construction.
基金the Austrian Research Promotion Agency(FFG)under the Austrian Space Application Programme(ASAP)within the projects Sen2Cube.at(project no.:866016)SemantiX(project no.:878939)SIMS(project no.:885365).
文摘The technological landscape for managing big Earth observation(EO)data ranges from global solutions on large cloud infrastructures with web-based access to self-hosted implementations.EO data cubes are a leading technology for facilitating big EO data analysis and can be deployed on different spatial scales:local,national,regional,or global.Several EO data cubes with a geographic focus(“local EO data cubes”)have been implemented.However,their alignment with the Digital Earth(DE)vision and the benefits and trade-offs in creating and maintaining them ought to be further examined.We investigate local EO data cubes from five perspectives(science,business and industry,government and policy,education,communities and citizens)and illustrate four examples covering three continents at different geographic scales(Swiss Data Cube,semantic EO data cube for Austria,DE Africa,Virginia Data Cube).A local EO data cube can benefit many stakeholders and players but requires several technical developments.These developments include enabling local EO data cubes based on public,global,and cloud-native EO data streaming and interoperability between local EO data cubes.We argue that blurring the dichotomy between global and local aligns with the DE vision to access the world’s knowledge and explore information about the planet.
基金The authors would like to thank the Swiss Federal Office for the Environment(FOEN)for their financial support to the Swiss Data Cube.
文摘Pressures on natural resources are increasing and a number of challenges need to be overcome to meet the needs of a growing population in a period of environmental variability.Some of these environmental issues can be monitored using remotely sensed Earth Observations(EO)data that are increasingly available from a number of freely and openly accessible repositories.However,the full information potential of EO data has not been yet realized.They remain still underutilized mainly because of their complexity,increasing volume,and the lack of efficient processing capabilities.EO Data Cubes(DC)are a new paradigm aiming to realize the full potential of EO data by lowering the barriers caused by these Big data challenges and providing access to large spatio-temporal data in an analysis ready form.Systematic and regular provision of Analysis Ready Data(ARD)will significantly reduce the burden on EO data users.Nevertheless,ARD are not commonly produced by data providers and therefore getting uniform and consistent ARD remains a challenging task.This paper presents an approach to enable rapid data access and pre-processing to generate ARD using interoperable services chains.The approach has been tested and validated generating Landsat ARD while building the Swiss Data Cube.
基金This research was funded by the European Commission“Horizon 2020 Program”ERA-PLANET/GEOEssential project,grant number 689443.
文摘Avoiding,reducing,and reversing land degradation and restoring degraded land is an urgent priority to protect the biodiversity and ecosystem services that are vital to life on Earth.To halt and reverse the current trends in land degradation,there is an immediate need to enhance national capacities to undertake quantitative assessments and mapping of their degraded lands,as required by the Sustainable Development Goals(SDGs),in particular,the SDG indicator 15.3.1(“proportion of land that is degraded over total land area”).Earth Observations(EO)can play an important role both for generating this indicator as well as complementing or enhancing national official data sources.Implementations like Trends.Earth to monitor land degradation in accordance with the SDG15.3.1 rely on default datasets of coarse spatial resolution provided by MODIS or AVHRR.Consequently,there is a need to develop methodologies to benefit from medium to high-resolution satellite EO data(e.g.Landsat or Sentinels).In response to this issue,this paper presents an initial overview of an innovative approach to monitor land degradation at the national scale in compliance with the SDG15.3.1 indicator using Landsat observations using a data cube but further work is required to improve the calculation of the three sub-indicators.
基金Supported by the National Key Scientific and Technological Project: Research on the Management of the Railroad Fundamental Information (Grant No.2002BA407B01-2) and the Science Foundation of Beijing Jiaotong University (Grant No.2003SZ003).
文摘QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.
文摘The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.
文摘The effort and cost required to convert satellite Earth Observation(EO)data into meaningful geophysical variables has prevented the systematic analysis of all available observations.To overcome these problems,we utilise an integrated High Performance Computing and Data environment to rapidly process,restructure and analyse the Australian Landsat data archive.In this approach,the EO data are assigned to a common grid framework that spans the full geospatial and temporal extent of the observations–the EO Data Cube.This approach is pixel-based and incorporates geometric and spectral calibration and quality assurance of each Earth surface reflectance measurement.We demonstrate the utility of the approach with rapid time-series mapping of surface water across the entire Australian continent using 27 years of continuous,25 m resolution observations.Our preliminary analysis of the Landsat archive shows how the EO Data Cube can effectively liberate high-resolution EO data from their complex sensor-specific data structures and revolutionise our ability to measure environmental change.
基金supported in part by a grant from HP Labs China,the National Natural Science Foundation of China under GrantNo.60496325the Main Memory OLAP Servers Project
文摘Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.
基金the National Natural Science Foundation of China(No.60703032)the National High Technology Research and Development Program(863)of China(No.2007AA01Z456)
文摘We propose a robust watermarking scheme and several extensions for digital right management of data cubes.The ownership information is hidden into a data cube by modifying a set of selected cell values.Its owner can use his private key to control all the watermarking parameters freely.Neither original data cube nor the watermark is required in watermark detection.Detailed analysis and extensive experiments are conducted for the proposed schemes in terms of watermark detectability,robustness and efficiency.Our results show that the scheme performs well in actual applications.