Collecting statistics is a time-and resource-consuming operation in database systems.It is even more challenging to efficiently collect statistics without affecting system performance,meanwhile keeping correctness in ...Collecting statistics is a time-and resource-consuming operation in database systems.It is even more challenging to efficiently collect statistics without affecting system performance,meanwhile keeping correctness in distributed database.Traditional strategies usually consider one dimension during collecting statistics,which is lack of adaptiveness.In this paper,we propose an adaptive strategy for statistics collecting(ASC),which well balances collecting efficiency,correctness of statistics and effect to system performance.We formally define the procedure of collecting statistics and abstract the relationships among collecting efficiency,correctness of statistics and effect to system performance,and introduce an elastic structure(ESI)storing necessary information generated during proceeding our strategy.ASC can pick appropriate time to trigger collecting action and filter unnecessary tasks,meanwhile reasonably allocating collecting tasks to appropriate executing locations with right executing models through the information stored at ESI.We implement and evaluate our strategy in a distributed database.Experiments show that our solutions generally improve the efficiency and correctness of collecting statistics,moreover,reduce the negative effect to system performance comparing with other strategies.展开更多
The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the...The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the National Conference of Recent Trends in Mathematical and Computer Sciences, T.M.B. University, Bhagalpur, India, January 3-4, 2015. Information is widely distributed across many remote, distributed, and autonomous databases (local component databases) in heterogeneous formats. The integration of heterogeneous remote databases is a difficult task, and it has already been addressed by several projects to certain extents. In this chapter, we have discussed how to integrate heterogeneous distributed local relational databases because of their simplicity, excellent security, performance, power, flexibility, data independence, support for new hardware technologies, and spread across the globe. We have also discussed how to constitute a global conceptual schema in the multidatabase system using Sybase Adaptive Server Enterprise’s Component Integration Services (CIS) and OmniConnect. This is feasible for higher education institutions and commercial industries as well. Considering the higher educational institutions, the CIS will improve IT integration for educational institutions with their subsidiaries or with other institutions within the country and abroad in terms of educational management, teaching, learning, and research, including promoting international students’ academic integration, collaboration, and governance. This will prove an innovative strategy to support the modernization and large expansion of academic institutions. This will be considered IT-institutional alignment within a higher education context. This will also support achieving one of the sustainable development goals set by the United Nations: “Goal 4: ensure inclusive and quality education for all and promote lifelong learning”. However, the process of IT integration into higher educational institutions must be thoroughly evaluated, identifying the vital data access points. In this chapter, Section 1 provides an introduction, including the evolution of various database systems, data models, and the emergence of multidatabase systems and their importance. Section 2 discusses component integration services (CIS), OmniConnect and considering heterogeneous relational distributed local databases from the perspective of academics, Section 3 discusses the Sybase Adaptive Server Enterprise (ASE), Section 4 discusses the role of component integration services and OmniConnect of Sybase ASE under the Multidatabase System, Section 5 shows the database architectural framework, Section 6 provides an implementation overview of the global conceptual schema in the multidatabase system, Section 7 discusses query processing in the CIS, and finally, Section 8 concludes the chapter. The chapter will help our students a lot, as we have discussed well the evolution of databases and data models and the emergence of multidatabases. Since some additional useful information is cited, the source of information for each citation is properly mentioned in the references column.展开更多
Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made availabl...Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.展开更多
To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using d...To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.展开更多
On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component o...On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component of distributed database management systems(DDBMSs).The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers.With the advent of new hardware and high parallelism of transaction processing,the traditional centralized design of logging limits scalability,and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads.In this paper,we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems.The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking.The kernel of adaptive replication is an adaptive log shipping method,which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload.We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000.Experimental results show that Salmo scales well by increasing the number of working threads,improves peak throughput by 1.56×and reduces latency by more than 4×over log replication of Raft,and maintains efficient and stable performance under dynamic workloads all the time.展开更多
基金This project was supported by Key Research and Development Program(2018YFB1003403)the National Natural Science Foundation of China(Grant Nos.61732014,61672432,61672434)Natural Science Basic Research Plan in Shaanxi Province of China(2017JM6104).
文摘Collecting statistics is a time-and resource-consuming operation in database systems.It is even more challenging to efficiently collect statistics without affecting system performance,meanwhile keeping correctness in distributed database.Traditional strategies usually consider one dimension during collecting statistics,which is lack of adaptiveness.In this paper,we propose an adaptive strategy for statistics collecting(ASC),which well balances collecting efficiency,correctness of statistics and effect to system performance.We formally define the procedure of collecting statistics and abstract the relationships among collecting efficiency,correctness of statistics and effect to system performance,and introduce an elastic structure(ESI)storing necessary information generated during proceeding our strategy.ASC can pick appropriate time to trigger collecting action and filter unnecessary tasks,meanwhile reasonably allocating collecting tasks to appropriate executing locations with right executing models through the information stored at ESI.We implement and evaluate our strategy in a distributed database.Experiments show that our solutions generally improve the efficiency and correctness of collecting statistics,moreover,reduce the negative effect to system performance comparing with other strategies.
文摘The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the National Conference of Recent Trends in Mathematical and Computer Sciences, T.M.B. University, Bhagalpur, India, January 3-4, 2015. Information is widely distributed across many remote, distributed, and autonomous databases (local component databases) in heterogeneous formats. The integration of heterogeneous remote databases is a difficult task, and it has already been addressed by several projects to certain extents. In this chapter, we have discussed how to integrate heterogeneous distributed local relational databases because of their simplicity, excellent security, performance, power, flexibility, data independence, support for new hardware technologies, and spread across the globe. We have also discussed how to constitute a global conceptual schema in the multidatabase system using Sybase Adaptive Server Enterprise’s Component Integration Services (CIS) and OmniConnect. This is feasible for higher education institutions and commercial industries as well. Considering the higher educational institutions, the CIS will improve IT integration for educational institutions with their subsidiaries or with other institutions within the country and abroad in terms of educational management, teaching, learning, and research, including promoting international students’ academic integration, collaboration, and governance. This will prove an innovative strategy to support the modernization and large expansion of academic institutions. This will be considered IT-institutional alignment within a higher education context. This will also support achieving one of the sustainable development goals set by the United Nations: “Goal 4: ensure inclusive and quality education for all and promote lifelong learning”. However, the process of IT integration into higher educational institutions must be thoroughly evaluated, identifying the vital data access points. In this chapter, Section 1 provides an introduction, including the evolution of various database systems, data models, and the emergence of multidatabase systems and their importance. Section 2 discusses component integration services (CIS), OmniConnect and considering heterogeneous relational distributed local databases from the perspective of academics, Section 3 discusses the Sybase Adaptive Server Enterprise (ASE), Section 4 discusses the role of component integration services and OmniConnect of Sybase ASE under the Multidatabase System, Section 5 shows the database architectural framework, Section 6 provides an implementation overview of the global conceptual schema in the multidatabase system, Section 7 discusses query processing in the CIS, and finally, Section 8 concludes the chapter. The chapter will help our students a lot, as we have discussed well the evolution of databases and data models and the emergence of multidatabases. Since some additional useful information is cited, the source of information for each citation is properly mentioned in the references column.
文摘Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.
文摘To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.
基金supported by the National Natural Science Foundation of China(Grant Nos.62002119,61977026,62072180,and 61772202)supported by the Fundamental Research Funds for the Central Universities,Southwest Minzu University(2021PTJS23)supported by the Open Fund of Shanghai Engineering Research Center on Big Data Management System.
文摘On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component of distributed database management systems(DDBMSs).The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers.With the advent of new hardware and high parallelism of transaction processing,the traditional centralized design of logging limits scalability,and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads.In this paper,we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems.The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking.The kernel of adaptive replication is an adaptive log shipping method,which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload.We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000.Experimental results show that Salmo scales well by increasing the number of working threads,improves peak throughput by 1.56×and reduces latency by more than 4×over log replication of Raft,and maintains efficient and stable performance under dynamic workloads all the time.