The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the...The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the National Conference of Recent Trends in Mathematical and Computer Sciences, T.M.B. University, Bhagalpur, India, January 3-4, 2015. Information is widely distributed across many remote, distributed, and autonomous databases (local component databases) in heterogeneous formats. The integration of heterogeneous remote databases is a difficult task, and it has already been addressed by several projects to certain extents. In this chapter, we have discussed how to integrate heterogeneous distributed local relational databases because of their simplicity, excellent security, performance, power, flexibility, data independence, support for new hardware technologies, and spread across the globe. We have also discussed how to constitute a global conceptual schema in the multidatabase system using Sybase Adaptive Server Enterprise’s Component Integration Services (CIS) and OmniConnect. This is feasible for higher education institutions and commercial industries as well. Considering the higher educational institutions, the CIS will improve IT integration for educational institutions with their subsidiaries or with other institutions within the country and abroad in terms of educational management, teaching, learning, and research, including promoting international students’ academic integration, collaboration, and governance. This will prove an innovative strategy to support the modernization and large expansion of academic institutions. This will be considered IT-institutional alignment within a higher education context. This will also support achieving one of the sustainable development goals set by the United Nations: “Goal 4: ensure inclusive and quality education for all and promote lifelong learning”. However, the process of IT integration into higher educational institutions must be thoroughly evaluated, identifying the vital data access points. In this chapter, Section 1 provides an introduction, including the evolution of various database systems, data models, and the emergence of multidatabase systems and their importance. Section 2 discusses component integration services (CIS), OmniConnect and considering heterogeneous relational distributed local databases from the perspective of academics, Section 3 discusses the Sybase Adaptive Server Enterprise (ASE), Section 4 discusses the role of component integration services and OmniConnect of Sybase ASE under the Multidatabase System, Section 5 shows the database architectural framework, Section 6 provides an implementation overview of the global conceptual schema in the multidatabase system, Section 7 discusses query processing in the CIS, and finally, Section 8 concludes the chapter. The chapter will help our students a lot, as we have discussed well the evolution of databases and data models and the emergence of multidatabases. Since some additional useful information is cited, the source of information for each citation is properly mentioned in the references column.展开更多
The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher wei...The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher weights are assigned to more significant attributes, so important attributes are more frequently fingerprinted than other ones. Finally, the robustness of the proposed algorithm, such as performance against collusion attacks, is analyzed. Experimental results prove the superiority of the algorithm.展开更多
This paper presents a development o f the extended Cellular Automata9CA),based on relational databases(RDB),to model dynamic interactions amon g spatial objects.The integration o f Geographical Information System(GIS)...This paper presents a development o f the extended Cellular Automata9CA),based on relational databases(RDB),to model dynamic interactions amon g spatial objects.The integration o f Geographical Information System(GIS)and CA has the great advantage of simu lationg geographical processes.But standard CA has some restrictions i n cellular shape and neighbourhood and neighbour rules,which restrict the CA’ s ability to simulate complex,real world environ-ments.This paper discusses a cell’ s spatialrelationbasedonthe spatialobject’ s geometricalandmon -geometricalc haracter-istics,and extends the cell’ s neighbour definition,and considers that the cell’ s neighbour lies in the forms of not on ly spa-tial adjacency but also attribute co rrelation.This paper then puts forw ard that spatial relations between t wo different cells can be divided into three types,including spatial adjacency,neighbour hood and complicated separation.Ba sed on tradition-al ideas,it is impossible to settle CA’ s restrictions completely.RDB -based CA is an academic experiment,in which some fields ard desighed to describe the essential information needed to define and select a cell’ s neighbour.The culture innovation diffusion system has mul tiple forms of space diffusion and in herited characteristics that the RD B -based CA is capable of simulating more effectiv ely.Finally this paper details a successful case study on the diffusion o f fashion wear trends.Compared to the original CA,the RDB -based CA is a more natural and efficient representation of human k nowl-edge over space,and is an effective t ol in simulation complex systems that have multiple forms of spatial diff usion.展开更多
With widespread use of relational database in various real-life applications,maintaining integrity and providing copyright protection is gaining keen interest of the researchers.For this purpose,watermarking has been ...With widespread use of relational database in various real-life applications,maintaining integrity and providing copyright protection is gaining keen interest of the researchers.For this purpose,watermarking has been used for quite a long time.Watermarking requires the role of trusted third party and a mechanism to extract digital signatures(watermark)to prove the ownership of the data under dispute.This is often inefficient as lots of processing is required.Moreover,certain malicious attacks,like additive attacks,can give rise to a situation when more than one parties can claim the ownership of the same data by inserting and detecting their own set of watermarks from the same data.To solve this problem,we propose to use blockchain technology—as trusted third party—along with watermarking for providing a means of rights protection of relational databases.Using blockchain for writing the copyright information alongside watermarking helps to secure the watermark as changing the blockchain is very difficult.This way,we combined the resilience of our watermarking scheme and the strength of blockchain technology—for protecting the digital rights information from alteration—to design and implement a robust scheme for digital right protection of relational databases.Moreover,we also discuss how the proposed scheme can also be used for version control.The proposed technique works with nonnumeric features of relational database and does not target only selected tuple or portion(subset)from the database for watermark embedding unlike most of the existing techniques;as a result,the chances of subset selection containing no watermark decrease automatically.The proposed technique employs zerowatermarking approach and hence no intentional error(watermark)is added to the original dataset.The results of the experiments proved the effectiveness of the proposed scheme.展开更多
A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. ...A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. A one-way hash function and a secret key known only to the owner of the data are used to select tuples and bits to mark. By assigning high weight to significant attributes, the scheme ensures that important attributes take more chance to be marked than less important ones. Experimental results show that the proposed scheme is robust against various forms of attacks, and has perfect immunity to subset attack.展开更多
In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
With the massive growth of the seismic data, it is required a new method to manage them. In this paper, the design method will be reported about relational database based on tree structure. Comparing with other design...With the massive growth of the seismic data, it is required a new method to manage them. In this paper, the design method will be reported about relational database based on tree structure. Comparing with other designs, it is not only simpler and easier to organize data, but also can simplify the design process of the database. This method has been used to design database of the earthquake monitor center station of the earthquake monitoring system for the Yangtze River Three Gorges Project and has shown good results.展开更多
For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and dura...For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relationa...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are an...This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are analyzed. Secondly, the corresponding mapping rules are proposed. At last an algorithm based on edge tables is presented. There are two key points in the algorithm. One is that the edge table is used to store the information of the relational dictionary, and this brings about the efficiency of the algorithm. The other is that structural information can be obtained from the resulting DTDs and other applications can optimize their query processes using the structural information.展开更多
In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative ...In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative materials, this paper presents extensible markup language (XML) based strategy for several important problems of data processing in network supported collaborative design, such as the representation of standard for the exchange of product model data (STEP) with XML in the product information expression and the management of XML documents using relational database. The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language (SQL) queries. Finally, the structure of data processing system based on XML is presented.展开更多
With the challenges brought by the expansion of network scale,as well as the diversity of the equipments and the complexity of network protocols,many self-configurable systems have been proposed combining formal speci...With the challenges brought by the expansion of network scale,as well as the diversity of the equipments and the complexity of network protocols,many self-configurable systems have been proposed combining formal specification and model finding techniques.In this paper,we pay more attention to formal specifications of network information,i.e.,exploring principles and algorithm to map network information(topology,devices and status,etc.) to Alloy specifications.We first model network information in relational form,which is easy to realize because of the structured feature of network information in nature.Then we map the relational data to Alloy specifications according to our novel data mapping principles and algorithm.Based on the transition of relational data,it is possible to automatically map network information to Alloy specifications.We evaluate our data mapping principles and algorithm by applying them to a practical application scenario.The results illustrate that we can find a model for the task within a tolerant time interval,which implies that our novel approach can convert relational data to Alloy specifications correctly and efficiently.展开更多
Data Mining, also known as knowledge discovery in data (KDC), is the process of uncovering patterns and other valuable information from large data sets. According to https://www.geeksforgeeks.org/data-mining/, it can ...Data Mining, also known as knowledge discovery in data (KDC), is the process of uncovering patterns and other valuable information from large data sets. According to https://www.geeksforgeeks.org/data-mining/, it can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. With advance research in health sector, there is multitude of Data available in healthcare sector. The general problem then becomes how to use the existing information in a more useful targeted way. Data Mining therefore is the best available technique. The objective of this paper is to review and analyse some of the different Data Mining Techniques such as Application, Classification, Clustering, Regression, etc. applied in the Domain of Healthcare.展开更多
In traditional database applications, queries intend to retrieve data satisfying precise conditions. As a result, thousands of data can be retrieved (overabundant answer) or, even worse, no data at all (empty answer)....In traditional database applications, queries intend to retrieve data satisfying precise conditions. As a result, thousands of data can be retrieved (overabundant answer) or, even worse, no data at all (empty answer). In both cases, the queries must be reformulated to produce more significant results and, typically, many related queries are submitted by a user before he can be finally satisfied. To overcome these problems, this paper proposes a unified solution in the framework of flexible queries with fuzzy semantics. This solution, based on the concept of semantic proximity and implemented in a tool for flexible query answering, allows the automatic reformulation of queries with empty or overabundant answers.展开更多
Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with key...Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.展开更多
Keyword Search Over Relational Databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In...Keyword Search Over Relational Databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In this paper, a new approach CLASCN (Classification, Learning And Selection of Candidate Network) is developed to efficiently perform top-κ keyword queries in schema-graph-based online KSORD systems. In this approach, the Candidate Networks (CNs) from trained keyword queries or executed user queries are classified and stored in the databases, and top-κ results from the CNs are learned for constructing CN Language Models (CNLMs). The CNLMs are used to compute the similarity scores between a new user query and the CNs from the query. The CNs with relatively large similarity score, which are the most promising ones to produce top-κ results, will be selected and performed. Currently, CLASCN is only applicable for past queries and New All-keyword-Used (NAU) queries which are frequently submitted queries. Extensive experiments also show the efficiency and effectiveness of our CLASCN approach.展开更多
This paper investigates the problem of ranking linked data from relational databases using a rank-ing framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instan...This paper investigates the problem of ranking linked data from relational databases using a rank-ing framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instances attached to each type. The ranking criteria for each step considers the mapping rules and heterogeneous graph structure of the data web. Tests based on a social network dataset show that the linked data ranking is effective and easier for people to understand. This approach benefits from utilizing relationships deduced from mapping rules based on table schemas and distinguishing the relationship types, which results in better ranking and visualization of the linked data.展开更多
This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presefited to ...This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presefited to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.展开更多
文摘The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the National Conference of Recent Trends in Mathematical and Computer Sciences, T.M.B. University, Bhagalpur, India, January 3-4, 2015. Information is widely distributed across many remote, distributed, and autonomous databases (local component databases) in heterogeneous formats. The integration of heterogeneous remote databases is a difficult task, and it has already been addressed by several projects to certain extents. In this chapter, we have discussed how to integrate heterogeneous distributed local relational databases because of their simplicity, excellent security, performance, power, flexibility, data independence, support for new hardware technologies, and spread across the globe. We have also discussed how to constitute a global conceptual schema in the multidatabase system using Sybase Adaptive Server Enterprise’s Component Integration Services (CIS) and OmniConnect. This is feasible for higher education institutions and commercial industries as well. Considering the higher educational institutions, the CIS will improve IT integration for educational institutions with their subsidiaries or with other institutions within the country and abroad in terms of educational management, teaching, learning, and research, including promoting international students’ academic integration, collaboration, and governance. This will prove an innovative strategy to support the modernization and large expansion of academic institutions. This will be considered IT-institutional alignment within a higher education context. This will also support achieving one of the sustainable development goals set by the United Nations: “Goal 4: ensure inclusive and quality education for all and promote lifelong learning”. However, the process of IT integration into higher educational institutions must be thoroughly evaluated, identifying the vital data access points. In this chapter, Section 1 provides an introduction, including the evolution of various database systems, data models, and the emergence of multidatabase systems and their importance. Section 2 discusses component integration services (CIS), OmniConnect and considering heterogeneous relational distributed local databases from the perspective of academics, Section 3 discusses the Sybase Adaptive Server Enterprise (ASE), Section 4 discusses the role of component integration services and OmniConnect of Sybase ASE under the Multidatabase System, Section 5 shows the database architectural framework, Section 6 provides an implementation overview of the global conceptual schema in the multidatabase system, Section 7 discusses query processing in the CIS, and finally, Section 8 concludes the chapter. The chapter will help our students a lot, as we have discussed well the evolution of databases and data models and the emergence of multidatabases. Since some additional useful information is cited, the source of information for each citation is properly mentioned in the references column.
文摘The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher weights are assigned to more significant attributes, so important attributes are more frequently fingerprinted than other ones. Finally, the robustness of the proposed algorithm, such as performance against collusion attacks, is analyzed. Experimental results prove the superiority of the algorithm.
文摘This paper presents a development o f the extended Cellular Automata9CA),based on relational databases(RDB),to model dynamic interactions amon g spatial objects.The integration o f Geographical Information System(GIS)and CA has the great advantage of simu lationg geographical processes.But standard CA has some restrictions i n cellular shape and neighbourhood and neighbour rules,which restrict the CA’ s ability to simulate complex,real world environ-ments.This paper discusses a cell’ s spatialrelationbasedonthe spatialobject’ s geometricalandmon -geometricalc haracter-istics,and extends the cell’ s neighbour definition,and considers that the cell’ s neighbour lies in the forms of not on ly spa-tial adjacency but also attribute co rrelation.This paper then puts forw ard that spatial relations between t wo different cells can be divided into three types,including spatial adjacency,neighbour hood and complicated separation.Ba sed on tradition-al ideas,it is impossible to settle CA’ s restrictions completely.RDB -based CA is an academic experiment,in which some fields ard desighed to describe the essential information needed to define and select a cell’ s neighbour.The culture innovation diffusion system has mul tiple forms of space diffusion and in herited characteristics that the RD B -based CA is capable of simulating more effectiv ely.Finally this paper details a successful case study on the diffusion o f fashion wear trends.Compared to the original CA,the RDB -based CA is a more natural and efficient representation of human k nowl-edge over space,and is an effective t ol in simulation complex systems that have multiple forms of spatial diff usion.
基金This project was supported by University of Jeddah under the Grant Number(UJ-02-014-ICGR).
文摘With widespread use of relational database in various real-life applications,maintaining integrity and providing copyright protection is gaining keen interest of the researchers.For this purpose,watermarking has been used for quite a long time.Watermarking requires the role of trusted third party and a mechanism to extract digital signatures(watermark)to prove the ownership of the data under dispute.This is often inefficient as lots of processing is required.Moreover,certain malicious attacks,like additive attacks,can give rise to a situation when more than one parties can claim the ownership of the same data by inserting and detecting their own set of watermarks from the same data.To solve this problem,we propose to use blockchain technology—as trusted third party—along with watermarking for providing a means of rights protection of relational databases.Using blockchain for writing the copyright information alongside watermarking helps to secure the watermark as changing the blockchain is very difficult.This way,we combined the resilience of our watermarking scheme and the strength of blockchain technology—for protecting the digital rights information from alteration—to design and implement a robust scheme for digital right protection of relational databases.Moreover,we also discuss how the proposed scheme can also be used for version control.The proposed technique works with nonnumeric features of relational database and does not target only selected tuple or portion(subset)from the database for watermark embedding unlike most of the existing techniques;as a result,the chances of subset selection containing no watermark decrease automatically.The proposed technique employs zerowatermarking approach and hence no intentional error(watermark)is added to the original dataset.The results of the experiments proved the effectiveness of the proposed scheme.
基金Supported by the Aeronautics Science Foundation of China (02F52033), the High-Technology Research Project of Jiangsu Province (BG2004005) and Youth Research Foundation of Qufu Normal Univer-sity(XJ02057)
文摘A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. A one-way hash function and a secret key known only to the owner of the data are used to select tuples and bits to mark. By assigning high weight to significant attributes, the scheme ensures that important attributes take more chance to be marked than less important ones. Experimental results show that the proposed scheme is robust against various forms of attacks, and has perfect immunity to subset attack.
文摘In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
基金Key Project of China Earthquake Administration during the ninth Five-year Plan (951204).
文摘With the massive growth of the seismic data, it is required a new method to manage them. In this paper, the design method will be reported about relational database based on tree structure. Comparing with other designs, it is not only simpler and easier to organize data, but also can simplify the design process of the database. This method has been used to design database of the earthquake monitor center station of the earthquake monitoring system for the Yangtze River Three Gorges Project and has shown good results.
基金supported by the Taiwan Ministry of Economic Affairs and Institute for Information Industry under the project titled "Fundamental Industrial Technology Development Program (1/4)"
文摘For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
文摘This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are analyzed. Secondly, the corresponding mapping rules are proposed. At last an algorithm based on edge tables is presented. There are two key points in the algorithm. One is that the edge table is used to store the information of the relational dictionary, and this brings about the efficiency of the algorithm. The other is that structural information can be obtained from the resulting DTDs and other applications can optimize their query processes using the structural information.
基金supported by National High Technology Research and Development Program of China (863 Program) (No. AA420060)
文摘In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative materials, this paper presents extensible markup language (XML) based strategy for several important problems of data processing in network supported collaborative design, such as the representation of standard for the exchange of product model data (STEP) with XML in the product information expression and the management of XML documents using relational database. The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language (SQL) queries. Finally, the structure of data processing system based on XML is presented.
基金supported by the National Science Foundation for Distinguished Young Scholars of China under Grant No.61225012 and No.71325002the Specialized Research Fund of the Doctoral Program of Higher Education for the Priority Development Areas under Grant No.20120042130003the Liaoning BaiQianWan Talents Program under Grant No.2013921068
文摘With the challenges brought by the expansion of network scale,as well as the diversity of the equipments and the complexity of network protocols,many self-configurable systems have been proposed combining formal specification and model finding techniques.In this paper,we pay more attention to formal specifications of network information,i.e.,exploring principles and algorithm to map network information(topology,devices and status,etc.) to Alloy specifications.We first model network information in relational form,which is easy to realize because of the structured feature of network information in nature.Then we map the relational data to Alloy specifications according to our novel data mapping principles and algorithm.Based on the transition of relational data,it is possible to automatically map network information to Alloy specifications.We evaluate our data mapping principles and algorithm by applying them to a practical application scenario.The results illustrate that we can find a model for the task within a tolerant time interval,which implies that our novel approach can convert relational data to Alloy specifications correctly and efficiently.
文摘Data Mining, also known as knowledge discovery in data (KDC), is the process of uncovering patterns and other valuable information from large data sets. According to https://www.geeksforgeeks.org/data-mining/, it can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. With advance research in health sector, there is multitude of Data available in healthcare sector. The general problem then becomes how to use the existing information in a more useful targeted way. Data Mining therefore is the best available technique. The objective of this paper is to review and analyse some of the different Data Mining Techniques such as Application, Classification, Clustering, Regression, etc. applied in the Domain of Healthcare.
基金supported by CNPq(Brazilian National Counsel of Technological and Scientific Development),under grant numbers 305484/2012-5 and 104200/2013-8.
文摘In traditional database applications, queries intend to retrieve data satisfying precise conditions. As a result, thousands of data can be retrieved (overabundant answer) or, even worse, no data at all (empty answer). In both cases, the queries must be reformulated to produce more significant results and, typically, many related queries are submitted by a user before he can be finally satisfied. To overcome these problems, this paper proposes a unified solution in the framework of flexible queries with fuzzy semantics. This solution, based on the concept of semantic proximity and implemented in a tool for flexible query answering, allows the automatic reformulation of queries with empty or overabundant answers.
文摘Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.
基金This work is supported by the National Natural Science Foundation of China under Grant Nos. 60473069 and 60496325.
文摘Keyword Search Over Relational Databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In this paper, a new approach CLASCN (Classification, Learning And Selection of Candidate Network) is developed to efficiently perform top-κ keyword queries in schema-graph-based online KSORD systems. In this approach, the Candidate Networks (CNs) from trained keyword queries or executed user queries are classified and stored in the databases, and top-κ results from the CNs are learned for constructing CN Language Models (CNLMs). The CNLMs are used to compute the similarity scores between a new user query and the CNs from the query. The CNs with relatively large similarity score, which are the most promising ones to produce top-κ results, will be selected and performed. Currently, CLASCN is only applicable for past queries and New All-keyword-Used (NAU) queries which are frequently submitted queries. Extensive experiments also show the efficiency and effectiveness of our CLASCN approach.
文摘This paper investigates the problem of ranking linked data from relational databases using a rank-ing framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instances attached to each type. The ranking criteria for each step considers the mapping rules and heterogeneous graph structure of the data web. Tests based on a social network dataset show that the linked data ranking is effective and easier for people to understand. This approach benefits from utilizing relationships deduced from mapping rules based on table schemas and distinguishing the relationship types, which results in better ranking and visualization of the linked data.
文摘This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presefited to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.