A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model...A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.展开更多
Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and e...Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and extensible. The design of the systems is heavily dependent on the flexibility and self-description of the data model. The characteristics of engineering data and their management facts are analyzed. Then engineering data warehouse (EDW) architecture and multi-layer metamodels are presented. Also an approach to manage anduse engineering data by a meta object is proposed. Finally, an application flight test EDW system (FTEDWS) is described and meta-objects to manage engineering data in the data warehouse are used. It shows that adopting a meta-modeling approach provides a support for interchangeability and a sufficiently flexible environment in which the system evolution and the reusability can be handled.展开更多
The effectiveness of the Business Intelligence(BI)system mainly depends on the quality of knowledge it produces.The decision-making process is hindered,and the user’s trust is lost,if the knowledge offered is undesir...The effectiveness of the Business Intelligence(BI)system mainly depends on the quality of knowledge it produces.The decision-making process is hindered,and the user’s trust is lost,if the knowledge offered is undesired or of poor quality.A Data Warehouse(DW)is a huge collection of data gathered from many sources and an important part of any BI solution to assist management in making better decisions.The Extract,Transform,and Load(ETL)process is the backbone of a DW system,and it is responsible for moving data from source systems into the DW system.The more mature the ETL process the more reliable the DW system.In this paper,we propose the ETL Maturity Model(EMM)that assists organizations in achieving a high-quality ETL system and thereby enhancing the quality of knowledge produced.The EMM is made up of five levels of maturity i.e.,Chaotic,Acceptable,Stable,Efficient and Reliable.Each level of maturity contains Key Process Areas(KPAs)that have been endorsed by industry experts and include all critical features of a good ETL system.Quality Objectives(QOs)are defined procedures that,when implemented,resulted in a high-quality ETL process.Each KPA has its own set of QOs,the execution of which meets the requirements of that KPA.Multiple brainstorming sessions with relevant industry experts helped to enhance the model.EMMwas deployed in two key projects utilizing multiple case studies to supplement the validation process and support our claim.This model can assist organizations in improving their current ETL process and transforming it into a more mature ETL system.This model can also provide high-quality information to assist users inmaking better decisions and gaining their trust.展开更多
Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify dat...Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.展开更多
The analysis of relevant standards and guidelines proved the lack of information on actions and activities concerning data warehouse testing. The absence of the complex data warehouse testing methodology seems to be c...The analysis of relevant standards and guidelines proved the lack of information on actions and activities concerning data warehouse testing. The absence of the complex data warehouse testing methodology seems to be crucial particularly in the phase of the data warehouse implementation. The aim of this article is to suggest basic data warehouse testing activities as a final part of data warehouse testing methodology. The testing activities that must be implemented in the process of the data warehouse testing can be split into four logical units regarding the multidimensional database testing, data pump testing, metadata and OLAP (Online Analytical Processing) testing. Between main testing activities can be included: revision of the multidimensional database scheme, optimizing of fact tables number, problem of data explosion, testing for correctness of aggregation and summation of data etc.展开更多
In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can su...In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can support this ongoing process with subsequent analysis.In this study,a solution to attaining this goal is proposed,based on the design and implementation of a data mart as part of a dimensional trajectory data warehouse(TDW)that acts as a repository for the management of movement data.A novel methodological approach is proposed for modeling multiple spatial and temporal dimensions in a logical model.The case study presented in this paper for modeling and analyzing workforce movement data is to support human resource management decision-making and the following discussion provides a representative example of the contribution of a TDW in the process of information management and decision support systems.The entire process of exporting,cleaning,consolidating,and transforming data is implemented to achieve an appropriate format for final import.Structured query language(SQL)queries demonstrate the convenience of dimensional design for data analysis,and valuable information can be extracted from the movements of employees on company premises to manage the workforce efficiently and effectively.Visual analytics through data visualization support the analysis and facilitate decisionmaking and business intelligence.展开更多
Students’grades not only serve as an effective indicator of their learning achievements but also to some extent reflect the completion of teaching tasks by the instructors.Currently,many universities across the count...Students’grades not only serve as an effective indicator of their learning achievements but also to some extent reflect the completion of teaching tasks by the instructors.Currently,many universities across the country have collected and recorded various information about students and teachers in the school’s information management system,but it is only a simple storage record and has not effectively excavated hidden information,and data have not been fully utilized.Student performance information,enrolment information,course information,teaching plans,and teacher-related information are currently stored in separate databases,which are independent of each other,making it difficult to perform effective data analysis.Data warehousing technology can integrate various information and use data analysis software to excavate more high-value information,which is convenient for teaching evaluation and optimizing teaching strategies.Based on data warehousing technology,the article uses the hierarchical concept of data warehousing to construct the ODS layer,DWD layer,DWS layer and ETL layer.Facing the data warehousing topic,the article designs the data warehousing conceptual model,logical model,and physical model based on student performance,providing a model basis for later data mining.展开更多
In the era of Big Data,many NoSQL databases emerged for the storage and later processing of vast volumes of data,using data structures that can follow columnar,key-value,document or graph formats.For analytical contex...In the era of Big Data,many NoSQL databases emerged for the storage and later processing of vast volumes of data,using data structures that can follow columnar,key-value,document or graph formats.For analytical contexts,requiring a Big Data Warehouse,Hive is used as the driving force,allowing the analysis of vast amounts of data.Data models in Hive are usually defined taking into consideration the queries that need to be answered.In this work,a set of rules is presented for the transformation of multidimensional data models into Hive tables,making available data at different levels of detail.These several levels are suited for answering different queries,depending on the analytical needs.After the identification of the Hive tables,this paper summarizes a demonstration case in which the implementation of a specific Big Data architecture shows how the evolution from a traditional Data Warehouse to a Big Data Warehouse is possible.展开更多
Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is pr...Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is propose a new multidimensional data model based on the partial ordering and mapping. The data model can fully express the complex data structure and semantics of data warehouse, and provide an OLAP operation as the core of the operation of algebra, support structure in levels of complex aggregation operation sequence, which can effectively support the application of OLAE The data model supports the concept of aggregation function constraint, and provides constraint mechanism of the hierarchy aggregation function.展开更多
The problem of storage and querying of large volumes of spatial grids is an issue to solve.In this paper,we propose a method to optimize queries to aggregate raster grids stored in databases.In our approach,we propose...The problem of storage and querying of large volumes of spatial grids is an issue to solve.In this paper,we propose a method to optimize queries to aggregate raster grids stored in databases.In our approach,we propose to estimate the exact result rather than calculate the exact result.This approach reduces query execution time.One advantage of our method is that it does not require implementing or modifying functionalities of database management systems.Our approach is based on a new data structure and a specific model of SQL queries.Our work is applied here to relational data warehouses.展开更多
Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently...Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently, greater emphasis has been placed on GIS (geographical information system)to deal with the marine information. The GIS has shown great success for terrestrial applications in the last decades, but its use in marine fields has been far more restricted. One of the main reasons is that most of the GIS systems or their data models are designed for land applications. They cannot do well with the nature of the marine environment and for the marine information. And this becomes a fundamental challenge to the traditional GIS and its data structure. This work designed a data model, the raster-based spatio-temporal hierarchical data model (RSHDM), for the marine information system, or for the knowledge discovery fi'om spatio-temporal data, which bases itself on the nature of the marine data and overcomes the shortages of the current spatio-temporal models when they are used in the field. As an experiment, the marine fishery data warehouse (FDW) for marine fishery management was set up, which was based on the RSHDM. The experiment proved that the RSHDM can do well with the data and can extract easily the aggregations that the management needs at different levels.展开更多
Based on the experience and achievement of the"China Digital Ocean", the classification plan for Marine data elements is made, which can be classified into five, including marine point elements, marine line elements...Based on the experience and achievement of the"China Digital Ocean", the classification plan for Marine data elements is made, which can be classified into five, including marine point elements, marine line elements, marine polygon elements, marine grid elements and marine dynamic elements. In this paper, the technology of features and object-oriented method, a spatial-temporal data model is proposed, which can be applied in the large information system engineering like the "Digital Ocean", and this paper discusses the application of spatial data model, marine three-dimensional raster data model and relation data model in the building of Data Warehouse in "China Digital Ocean", and concludes the merits of these models.展开更多
In order to exchange and share information among the conceptual models of data warehouse, and to build a solid base for the integration and share of metadata, a new multidimensional concept model is presented based on...In order to exchange and share information among the conceptual models of data warehouse, and to build a solid base for the integration and share of metadata, a new multidimensional concept model is presented based on XML and its DTD is defined, which can perfectly describe various semantic characteristics of multidimensional conceptual model. According to the multidimensional conceptual modeling technique which is based on UML, the mapping algorithm between the multidimensional conceptual model is described based on XML and UML class diagram, and an application base for the wide use of this technique is given.展开更多
The objectives of quality management systems which are based on data warehouses are to acquire, store, and process quality control data within an enterprise, and to facilitate analysis, control and decision making bas...The objectives of quality management systems which are based on data warehouses are to acquire, store, and process quality control data within an enterprise, and to facilitate analysis, control and decision making based on this data. This paper discusses the DB/ODS/DW (traditional database/operational data store/data warehouse) architecture, data granularity and data partition in the data warehouse, describes the data model, and presents the client/server platform model.展开更多
文摘A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.
文摘Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and extensible. The design of the systems is heavily dependent on the flexibility and self-description of the data model. The characteristics of engineering data and their management facts are analyzed. Then engineering data warehouse (EDW) architecture and multi-layer metamodels are presented. Also an approach to manage anduse engineering data by a meta object is proposed. Finally, an application flight test EDW system (FTEDWS) is described and meta-objects to manage engineering data in the data warehouse are used. It shows that adopting a meta-modeling approach provides a support for interchangeability and a sufficiently flexible environment in which the system evolution and the reusability can be handled.
基金King Saud University for funding this work through Researchers Supporting Project Number(RSP-2021/387),King Saud University,Riyadh,Saudi Arabia.
文摘The effectiveness of the Business Intelligence(BI)system mainly depends on the quality of knowledge it produces.The decision-making process is hindered,and the user’s trust is lost,if the knowledge offered is undesired or of poor quality.A Data Warehouse(DW)is a huge collection of data gathered from many sources and an important part of any BI solution to assist management in making better decisions.The Extract,Transform,and Load(ETL)process is the backbone of a DW system,and it is responsible for moving data from source systems into the DW system.The more mature the ETL process the more reliable the DW system.In this paper,we propose the ETL Maturity Model(EMM)that assists organizations in achieving a high-quality ETL system and thereby enhancing the quality of knowledge produced.The EMM is made up of five levels of maturity i.e.,Chaotic,Acceptable,Stable,Efficient and Reliable.Each level of maturity contains Key Process Areas(KPAs)that have been endorsed by industry experts and include all critical features of a good ETL system.Quality Objectives(QOs)are defined procedures that,when implemented,resulted in a high-quality ETL process.Each KPA has its own set of QOs,the execution of which meets the requirements of that KPA.Multiple brainstorming sessions with relevant industry experts helped to enhance the model.EMMwas deployed in two key projects utilizing multiple case studies to supplement the validation process and support our claim.This model can assist organizations in improving their current ETL process and transforming it into a more mature ETL system.This model can also provide high-quality information to assist users inmaking better decisions and gaining their trust.
基金Supported by National Natural Science Foundation of China (No. 50475117)Tianjin Natural Science Foundation (No.06YFJMJC03700).
文摘Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.
文摘The analysis of relevant standards and guidelines proved the lack of information on actions and activities concerning data warehouse testing. The absence of the complex data warehouse testing methodology seems to be crucial particularly in the phase of the data warehouse implementation. The aim of this article is to suggest basic data warehouse testing activities as a final part of data warehouse testing methodology. The testing activities that must be implemented in the process of the data warehouse testing can be split into four logical units regarding the multidimensional database testing, data pump testing, metadata and OLAP (Online Analytical Processing) testing. Between main testing activities can be included: revision of the multidimensional database scheme, optimizing of fact tables number, problem of data explosion, testing for correctness of aggregation and summation of data etc.
文摘In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can support this ongoing process with subsequent analysis.In this study,a solution to attaining this goal is proposed,based on the design and implementation of a data mart as part of a dimensional trajectory data warehouse(TDW)that acts as a repository for the management of movement data.A novel methodological approach is proposed for modeling multiple spatial and temporal dimensions in a logical model.The case study presented in this paper for modeling and analyzing workforce movement data is to support human resource management decision-making and the following discussion provides a representative example of the contribution of a TDW in the process of information management and decision support systems.The entire process of exporting,cleaning,consolidating,and transforming data is implemented to achieve an appropriate format for final import.Structured query language(SQL)queries demonstrate the convenience of dimensional design for data analysis,and valuable information can be extracted from the movements of employees on company premises to manage the workforce efficiently and effectively.Visual analytics through data visualization support the analysis and facilitate decisionmaking and business intelligence.
基金This work was supported by the Hainan Provincial Natural Science Foundation of China(project number:622RC723)the Education Department of Hainan Province(project number:Hnky2023-72).
文摘Students’grades not only serve as an effective indicator of their learning achievements but also to some extent reflect the completion of teaching tasks by the instructors.Currently,many universities across the country have collected and recorded various information about students and teachers in the school’s information management system,but it is only a simple storage record and has not effectively excavated hidden information,and data have not been fully utilized.Student performance information,enrolment information,course information,teaching plans,and teacher-related information are currently stored in separate databases,which are independent of each other,making it difficult to perform effective data analysis.Data warehousing technology can integrate various information and use data analysis software to excavate more high-value information,which is convenient for teaching evaluation and optimizing teaching strategies.Based on data warehousing technology,the article uses the hierarchical concept of data warehousing to construct the ODS layer,DWD layer,DWS layer and ETL layer.Facing the data warehousing topic,the article designs the data warehousing conceptual model,logical model,and physical model based on student performance,providing a model basis for later data mining.
基金This work has been supported by COMPETE:POCI-01-0145-FEDER-007043 and FCT(Fundação para a Ciência e Tecnologia)within the Project Scope:UID/CEC/00319/2013This work has been funded by the SusCity project(MITP-TB/CS/0026/2013)by Portugal Incentive System for Research and Technological Development,Project in co-promotion no 002814/2015(iFACTORY 2015-2018).
文摘In the era of Big Data,many NoSQL databases emerged for the storage and later processing of vast volumes of data,using data structures that can follow columnar,key-value,document or graph formats.For analytical contexts,requiring a Big Data Warehouse,Hive is used as the driving force,allowing the analysis of vast amounts of data.Data models in Hive are usually defined taking into consideration the queries that need to be answered.In this work,a set of rules is presented for the transformation of multidimensional data models into Hive tables,making available data at different levels of detail.These several levels are suited for answering different queries,depending on the analytical needs.After the identification of the Hive tables,this paper summarizes a demonstration case in which the implementation of a specific Big Data architecture shows how the evolution from a traditional Data Warehouse to a Big Data Warehouse is possible.
文摘Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is propose a new multidimensional data model based on the partial ordering and mapping. The data model can fully express the complex data structure and semantics of data warehouse, and provide an OLAP operation as the core of the operation of algebra, support structure in levels of complex aggregation operation sequence, which can effectively support the application of OLAE The data model supports the concept of aggregation function constraint, and provides constraint mechanism of the hierarchy aggregation function.
基金This work is funded by Auvergne region,Feder,Agaetis,and Irstea.
文摘The problem of storage and querying of large volumes of spatial grids is an issue to solve.In this paper,we propose a method to optimize queries to aggregate raster grids stored in databases.In our approach,we propose to estimate the exact result rather than calculate the exact result.This approach reduces query execution time.One advantage of our method is that it does not require implementing or modifying functionalities of database management systems.Our approach is based on a new data structure and a specific model of SQL queries.Our work is applied here to relational data warehouses.
基金supported by the National Key Basic Research and Development Program of China under contract No.2006CB701305the National Natural Science Foundation of China under coutract No.40571129the National High-Technology Program of China under contract Nos 2002AA639400,2003AA604040 and 2003AA637030.
文摘Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently, greater emphasis has been placed on GIS (geographical information system)to deal with the marine information. The GIS has shown great success for terrestrial applications in the last decades, but its use in marine fields has been far more restricted. One of the main reasons is that most of the GIS systems or their data models are designed for land applications. They cannot do well with the nature of the marine environment and for the marine information. And this becomes a fundamental challenge to the traditional GIS and its data structure. This work designed a data model, the raster-based spatio-temporal hierarchical data model (RSHDM), for the marine information system, or for the knowledge discovery fi'om spatio-temporal data, which bases itself on the nature of the marine data and overcomes the shortages of the current spatio-temporal models when they are used in the field. As an experiment, the marine fishery data warehouse (FDW) for marine fishery management was set up, which was based on the RSHDM. The experiment proved that the RSHDM can do well with the data and can extract easily the aggregations that the management needs at different levels.
文摘Based on the experience and achievement of the"China Digital Ocean", the classification plan for Marine data elements is made, which can be classified into five, including marine point elements, marine line elements, marine polygon elements, marine grid elements and marine dynamic elements. In this paper, the technology of features and object-oriented method, a spatial-temporal data model is proposed, which can be applied in the large information system engineering like the "Digital Ocean", and this paper discusses the application of spatial data model, marine three-dimensional raster data model and relation data model in the building of Data Warehouse in "China Digital Ocean", and concludes the merits of these models.
文摘In order to exchange and share information among the conceptual models of data warehouse, and to build a solid base for the integration and share of metadata, a new multidimensional concept model is presented based on XML and its DTD is defined, which can perfectly describe various semantic characteristics of multidimensional conceptual model. According to the multidimensional conceptual modeling technique which is based on UML, the mapping algorithm between the multidimensional conceptual model is described based on XML and UML class diagram, and an application base for the wide use of this technique is given.
文摘The objectives of quality management systems which are based on data warehouses are to acquire, store, and process quality control data within an enterprise, and to facilitate analysis, control and decision making based on this data. This paper discusses the DB/ODS/DW (traditional database/operational data store/data warehouse) architecture, data granularity and data partition in the data warehouse, describes the data model, and presents the client/server platform model.