This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube const...This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube construction is proposed, which includes Web data modeling based on MIX ( Metadam based Integration model for data X-change ), generic and specific mapping rules design, and a transformation algorithm for mapping Web data to a multidimensional array. Besides, the structure and implementation of the prototype of a Web data base cube are discussed.展开更多
A conceptual,data-driven framework for organizing the data analytics and control functions in an electrical distribution network is proposed in this paper.The framework is built such that it tightly corresponds to the...A conceptual,data-driven framework for organizing the data analytics and control functions in an electrical distribution network is proposed in this paper.The framework is built such that it tightly corresponds to the naturally existing physical hierarchy of typical radial distribution networks,allowing for an organized and highly-localized set of data storage and analytics processes,which in turn correspond well to likely control commands.By utilizing this structure,the computational entities in the framework are endowed with persistent local situational awareness.However,the framework also permits,through a series of tiered communications,the operation of a centralized authority for overall system observability and controllability.展开更多
Database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off la...Database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.展开更多
QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved...QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.展开更多
Data warehouse (DW) modeling is a complicated task, involving both knowledge of business processes and familiarity with operational information systems structure and behavior. Existing DW modeling techniques suffer ...Data warehouse (DW) modeling is a complicated task, involving both knowledge of business processes and familiarity with operational information systems structure and behavior. Existing DW modeling techniques suffer from the following major drawbacks -- data-driven approach requires high levels of expertise and neglects the requirements of end users, while demand-driven approach lacks enterprise-wide vision and is regardless of existing models of underlying operational systems. In order to make up for those shortcomings, a method of classification of schema elements for DW modeling is proposed in this paper. We first put forward the vector space models for subjects and schema elements, then present an adaptive approach with self-tuning theory to construct context vectors of subjects, and finally classify the source schema elements into different subjects of the DW automatically. Benefited from the result of the schema elements classification, designers can model and construct a DW more easily.展开更多
Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed witho...Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.展开更多
基金The National Natural Science Foundation of China (No.60573165)
文摘This paper investigates how to integrate Web data into a multidimensional data warehouse (cube) for comprehensive on-line analytical processing (OLAP) and decision making. An approach for Web data-based cube construction is proposed, which includes Web data modeling based on MIX ( Metadam based Integration model for data X-change ), generic and specific mapping rules design, and a transformation algorithm for mapping Web data to a multidimensional array. Besides, the structure and implementation of the prototype of a Web data base cube are discussed.
基金supported by the Science and Technology Program of the State Grid Corporation of China(DZB51201403772)the National Natural Science and Technology Fund of China(51261130472).
文摘A conceptual,data-driven framework for organizing the data analytics and control functions in an electrical distribution network is proposed in this paper.The framework is built such that it tightly corresponds to the naturally existing physical hierarchy of typical radial distribution networks,allowing for an organized and highly-localized set of data storage and analytics processes,which in turn correspond well to likely control commands.By utilizing this structure,the computational entities in the framework are endowed with persistent local situational awareness.However,the framework also permits,through a series of tiered communications,the operation of a centralized authority for overall system observability and controllability.
基金Supported by the National Natural Science Foundation of China. Acknowledgements The National Science Foundation of China supported these works. Thanks to NSFC and all the members of the research groups in Renmin University of China.
文摘Database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.
基金Supported by the National Key Scientific and Technological Project: Research on the Management of the Railroad Fundamental Information (Grant No.2002BA407B01-2) and the Science Foundation of Beijing Jiaotong University (Grant No.2003SZ003).
文摘QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC- Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.
基金Supported by the National Natural Science Foundation of China under Grant No. 60403041, the Project of National "10th Five-Year Plan" of China under Grant No. 2001BA102A01, and the National Grand Fundamental Research 973 Program of China under Grant No. 1999032705. Acknowledgements The first author is very thankful to Jian Liu, Bo Yu, Peng Zhang and Ling-Fu Li for their valuable suggestions on this paper. The authors are also grateful to anonymous reviewers for their insightful comments on this paper, which have helped to improve the manuscript.
文摘Data warehouse (DW) modeling is a complicated task, involving both knowledge of business processes and familiarity with operational information systems structure and behavior. Existing DW modeling techniques suffer from the following major drawbacks -- data-driven approach requires high levels of expertise and neglects the requirements of end users, while demand-driven approach lacks enterprise-wide vision and is regardless of existing models of underlying operational systems. In order to make up for those shortcomings, a method of classification of schema elements for DW modeling is proposed in this paper. We first put forward the vector space models for subjects and schema elements, then present an adaptive approach with self-tuning theory to construct context vectors of subjects, and finally classify the source schema elements into different subjects of the DW automatically. Benefited from the result of the schema elements classification, designers can model and construct a DW more easily.
基金supported in part by a grant from HP Labs China,the National Natural Science Foundation of China under GrantNo.60496325the Main Memory OLAP Servers Project
文摘Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.