The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the...The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the National Conference of Recent Trends in Mathematical and Computer Sciences, T.M.B. University, Bhagalpur, India, January 3-4, 2015. Information is widely distributed across many remote, distributed, and autonomous databases (local component databases) in heterogeneous formats. The integration of heterogeneous remote databases is a difficult task, and it has already been addressed by several projects to certain extents. In this chapter, we have discussed how to integrate heterogeneous distributed local relational databases because of their simplicity, excellent security, performance, power, flexibility, data independence, support for new hardware technologies, and spread across the globe. We have also discussed how to constitute a global conceptual schema in the multidatabase system using Sybase Adaptive Server Enterprise’s Component Integration Services (CIS) and OmniConnect. This is feasible for higher education institutions and commercial industries as well. Considering the higher educational institutions, the CIS will improve IT integration for educational institutions with their subsidiaries or with other institutions within the country and abroad in terms of educational management, teaching, learning, and research, including promoting international students’ academic integration, collaboration, and governance. This will prove an innovative strategy to support the modernization and large expansion of academic institutions. This will be considered IT-institutional alignment within a higher education context. This will also support achieving one of the sustainable development goals set by the United Nations: “Goal 4: ensure inclusive and quality education for all and promote lifelong learning”. However, the process of IT integration into higher educational institutions must be thoroughly evaluated, identifying the vital data access points. In this chapter, Section 1 provides an introduction, including the evolution of various database systems, data models, and the emergence of multidatabase systems and their importance. Section 2 discusses component integration services (CIS), OmniConnect and considering heterogeneous relational distributed local databases from the perspective of academics, Section 3 discusses the Sybase Adaptive Server Enterprise (ASE), Section 4 discusses the role of component integration services and OmniConnect of Sybase ASE under the Multidatabase System, Section 5 shows the database architectural framework, Section 6 provides an implementation overview of the global conceptual schema in the multidatabase system, Section 7 discusses query processing in the CIS, and finally, Section 8 concludes the chapter. The chapter will help our students a lot, as we have discussed well the evolution of databases and data models and the emergence of multidatabases. Since some additional useful information is cited, the source of information for each citation is properly mentioned in the references column.展开更多
An important task in database integration is to resolve data conflicts, on both schema-level and semantic-level. Especially difficult the latter is. Some existing ontology-based approaches have been criticized for the...An important task in database integration is to resolve data conflicts, on both schema-level and semantic-level. Especially difficult the latter is. Some existing ontology-based approaches have been criticized for their lack of domain generality and semantic richness. With the aim to overcome these limitations, this paper introduces a systematic approach for detecting and resolving various semantic conflicts in heterogeneous databases, which includes two important parts: a semantic conflict representation model based on our classification framework of semantic conflicts, and a methodology for detecting and resolving semantic conflicts based on this model. The system has been developed, experimental evaluations on which indicate that this approach can resolve much of the semantic conflicts effectively, and keep independent of domains and integration patterns.展开更多
One of the core developments in geomathematics in now days is the use of digital data processing in mineral prospecting and assessment. The information discovery is based on multidisciplinary geoscientific data and an...One of the core developments in geomathematics in now days is the use of digital data processing in mineral prospecting and assessment. The information discovery is based on multidisciplinary geoscientific data and an integrated management approach is crucial. The lack of a standard description hinders interoperations in database search and discovery. Metadata hierarchy aims to provide a standard view of the geoscientific data, and facilitate data description and discovery. In the research of integrated geoscientific database, the metadata hierarchy used a standardized description for each collection in the content structure and realized in semantic structure. It recorded both dataset identification and inner structures and relationships of objects, thus differed from many other applications. There were four tiers in the content structure and three levels in the semantic structure. With its help, database users could determine how applicable a dataset is to a project, and improve their queries to the database. Effectiveness of data accessing is significantly enhanced through the rich, consistent metadata.展开更多
Construction of integrated database including casting shapes with their casting design, technical knowledge, and thermophysical properties of the casting alloys were introduced in the present study. Recognition tech- ...Construction of integrated database including casting shapes with their casting design, technical knowledge, and thermophysical properties of the casting alloys were introduced in the present study. Recognition tech- nique for casting design by industrial computer tomography was used for the construction of shape database. Technical knowledge of the casting processes such as ferrous and non-ferrous alloys and their manufacturing process of the castings were accumulated and the search engine for the knowledge was developed. Database of thermophysical properties of the casting alloys were obtained via the experimental study, and the properties were used for the in-house computer simulation of casting process. The databases were linked with intelligent casting expert system developed in center for e-design, KITECH. It is expected that the databases can help non casting experts to devise the casting and its process. Various examples of the application by using the databases were shown in the present study.展开更多
Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroa...Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes.展开更多
Currently, schema integration frameworks use approaches like rule-based, machine learning, etc. This paper presents an ontology-based wrapper-mediator framework that uses both the rule-based and machine learning strat...Currently, schema integration frameworks use approaches like rule-based, machine learning, etc. This paper presents an ontology-based wrapper-mediator framework that uses both the rule-based and machine learning strategies at the same time. The proposed framework uses global and local ontologies for resolving syntactic and semantic heterogeneity, and XML for interoperability. The concepts in the candidate schemas are merged on the basis of the similarity coefficient, which is calculated using the defined rules and the prior mappings stored in the case-base.展开更多
Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartG...Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.展开更多
The integrated computational materials engineering(ICME)has achieved great success in accelerating the rational design and deployment of new materials.It is a new route of designing new materials and processes and hig...The integrated computational materials engineering(ICME)has achieved great success in accelerating the rational design and deployment of new materials.It is a new route of designing new materials and processes and highlighted by Materials Genome Initiative/Engineering that stresses the high-throughput computation in addition to high-throughput experimentation and materials informatics.This article presents a brief review on the basic theories and multi-scale computational tools of ICME to design advanced steel grades,including the first-principles calculations,the CALPHAD method(i.e.,computational thermodynamics)fueled by dedicated databases,diffusion and phase-field simulations,as well as finite analysis methods and machine learning.In the ICME scheme to deal with steels,the CALPHAD method is considered as the core to readily consider multi-component systems and integrated to link the microscopic simulations(such as diffusion and phase field method to predict microstructure evolutions in response to external conditions)and macroscopic finite analysis method to deal with mechanical properties.Two applications are also presented to address the new routes to carry out materials design,especially for advanced steels.展开更多
The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of t...The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of techniques and tools for addressing this problem, however, the complex nature of the matching problem make existing solutions for real situations not fully satisfactory. The Google Similarity Distance has appeared recently. Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions. Our work consists of developing a software application for validating results discovered by schema and ontolog2/ matching tools using the philosophy behind this distance. Moreover, we are interested in using not only Google, but other popular search engines with this similarity distance. The results reveal three main facts. Firstly, some web search engines can help us to validate semantic correspondences satisfactorily. Secondly there are significant differences among the web search engines. And thirdly the best results are obtained when using combinations of the web search engines that we have studied.展开更多
The way we interact with spatial data has been changed from 2D map to 3D Virtual Geographic Environment (VGE). Three-dimensional representations of geographic information on a computer are known as VGE, and in parti...The way we interact with spatial data has been changed from 2D map to 3D Virtual Geographic Environment (VGE). Three-dimensional representations of geographic information on a computer are known as VGE, and in particular 3D city models provide an efficient way to integrate massive, heterogenous geospatial information and georeferenced information in urban areas. 3D city modeling (3DCM) is an active research and practice topic in distinct application areas. This paper intro- duces different modeling paradigms employed in 3D GIS, virtual environment, and AEC/FM. Up-to-date 3DCM technologies are evolving into a data integration and collaborative approach to represent the full spatial coverage of a city, to model both aboveground and underground, outdoor and indoor environments including man-made objects and natural features with 3D geometry, appearance, topology and semantics.展开更多
Integration of pathway and protein-protein interaction(PPI) data can provide more information that could lead to new biological insights. PPIs are usually represented by a simple binary model, whereas pathways are rep...Integration of pathway and protein-protein interaction(PPI) data can provide more information that could lead to new biological insights. PPIs are usually represented by a simple binary model, whereas pathways are represented by more complicated models. We developed a series of rules for transforming protein interactions from pathway to binary model, and the protein interactions from seven pathway databases, including PID, Bio Carta, Reactome, Net Path, INOH, SPIKE and KEGG, were transformed based on these rules. These pathway-derived binary protein interactions were integrated with PPIs from other five PPI databases including HPRD, Int Act, Bio GRID, MINT and DIP, to develop integrated dataset(named Path PPI). More detailed interaction type and modification information on protein interactions can be preserved in Path PPI than other existing datasets. Comparison analysis results indicate that most of the interaction overlaps values(OAB) among these pathway databases were less than 5%, and these databases must be used conjunctively. The Path PPI data was provided at http://proteomeview. hupo.org.cn/Path PPI/Path PPI.html.展开更多
文摘The book chapter is an extended version of the research paper entitled “Use of Component Integration Services in Multidatabase Systems”, which is presented and published by the 13<sup>th</sup> ISITA, the National Conference of Recent Trends in Mathematical and Computer Sciences, T.M.B. University, Bhagalpur, India, January 3-4, 2015. Information is widely distributed across many remote, distributed, and autonomous databases (local component databases) in heterogeneous formats. The integration of heterogeneous remote databases is a difficult task, and it has already been addressed by several projects to certain extents. In this chapter, we have discussed how to integrate heterogeneous distributed local relational databases because of their simplicity, excellent security, performance, power, flexibility, data independence, support for new hardware technologies, and spread across the globe. We have also discussed how to constitute a global conceptual schema in the multidatabase system using Sybase Adaptive Server Enterprise’s Component Integration Services (CIS) and OmniConnect. This is feasible for higher education institutions and commercial industries as well. Considering the higher educational institutions, the CIS will improve IT integration for educational institutions with their subsidiaries or with other institutions within the country and abroad in terms of educational management, teaching, learning, and research, including promoting international students’ academic integration, collaboration, and governance. This will prove an innovative strategy to support the modernization and large expansion of academic institutions. This will be considered IT-institutional alignment within a higher education context. This will also support achieving one of the sustainable development goals set by the United Nations: “Goal 4: ensure inclusive and quality education for all and promote lifelong learning”. However, the process of IT integration into higher educational institutions must be thoroughly evaluated, identifying the vital data access points. In this chapter, Section 1 provides an introduction, including the evolution of various database systems, data models, and the emergence of multidatabase systems and their importance. Section 2 discusses component integration services (CIS), OmniConnect and considering heterogeneous relational distributed local databases from the perspective of academics, Section 3 discusses the Sybase Adaptive Server Enterprise (ASE), Section 4 discusses the role of component integration services and OmniConnect of Sybase ASE under the Multidatabase System, Section 5 shows the database architectural framework, Section 6 provides an implementation overview of the global conceptual schema in the multidatabase system, Section 7 discusses query processing in the CIS, and finally, Section 8 concludes the chapter. The chapter will help our students a lot, as we have discussed well the evolution of databases and data models and the emergence of multidatabases. Since some additional useful information is cited, the source of information for each citation is properly mentioned in the references column.
基金This work is supported by the National natural Science Foundation of China under Grant No. 60573126, the National High-Tech Research and Development 863 Program of China under Grant No. 2004AA112010, the National Grand Fundamental Research 973 Program of China under Grant No. 2002CB312005.
文摘An important task in database integration is to resolve data conflicts, on both schema-level and semantic-level. Especially difficult the latter is. Some existing ontology-based approaches have been criticized for their lack of domain generality and semantic richness. With the aim to overcome these limitations, this paper introduces a systematic approach for detecting and resolving various semantic conflicts in heterogeneous databases, which includes two important parts: a semantic conflict representation model based on our classification framework of semantic conflicts, and a methodology for detecting and resolving semantic conflicts based on this model. The system has been developed, experimental evaluations on which indicate that this approach can resolve much of the semantic conflicts effectively, and keep independent of domains and integration patterns.
基金Funded by the National 863 Program of China (No.2002AA130406)the Key Project of China Geological Survey (No.200218310077).
文摘One of the core developments in geomathematics in now days is the use of digital data processing in mineral prospecting and assessment. The information discovery is based on multidisciplinary geoscientific data and an integrated management approach is crucial. The lack of a standard description hinders interoperations in database search and discovery. Metadata hierarchy aims to provide a standard view of the geoscientific data, and facilitate data description and discovery. In the research of integrated geoscientific database, the metadata hierarchy used a standardized description for each collection in the content structure and realized in semantic structure. It recorded both dataset identification and inner structures and relationships of objects, thus differed from many other applications. There were four tiers in the content structure and three levels in the semantic structure. With its help, database users could determine how applicable a dataset is to a project, and improve their queries to the database. Effectiveness of data accessing is significantly enhanced through the rich, consistent metadata.
文摘Construction of integrated database including casting shapes with their casting design, technical knowledge, and thermophysical properties of the casting alloys were introduced in the present study. Recognition tech- nique for casting design by industrial computer tomography was used for the construction of shape database. Technical knowledge of the casting processes such as ferrous and non-ferrous alloys and their manufacturing process of the castings were accumulated and the search engine for the knowledge was developed. Database of thermophysical properties of the casting alloys were obtained via the experimental study, and the properties were used for the in-house computer simulation of casting process. The databases were linked with intelligent casting expert system developed in center for e-design, KITECH. It is expected that the databases can help non casting experts to devise the casting and its process. Various examples of the application by using the databases were shown in the present study.
基金supported by a grant from the National Natural Science Foundation of China(Grant No.61373057)a grant from the Zhejiang Provincial Natural Science Foundation of China(Grant No.Y1110763)
文摘Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes.
文摘Currently, schema integration frameworks use approaches like rule-based, machine learning, etc. This paper presents an ontology-based wrapper-mediator framework that uses both the rule-based and machine learning strategies at the same time. The proposed framework uses global and local ontologies for resolving syntactic and semantic heterogeneity, and XML for interoperability. The concepts in the candidate schemas are merged on the basis of the similarity coefficient, which is calculated using the defined rules and the prior mappings stored in the case-base.
文摘Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.
文摘The integrated computational materials engineering(ICME)has achieved great success in accelerating the rational design and deployment of new materials.It is a new route of designing new materials and processes and highlighted by Materials Genome Initiative/Engineering that stresses the high-throughput computation in addition to high-throughput experimentation and materials informatics.This article presents a brief review on the basic theories and multi-scale computational tools of ICME to design advanced steel grades,including the first-principles calculations,the CALPHAD method(i.e.,computational thermodynamics)fueled by dedicated databases,diffusion and phase-field simulations,as well as finite analysis methods and machine learning.In the ICME scheme to deal with steels,the CALPHAD method is considered as the core to readily consider multi-component systems and integrated to link the microscopic simulations(such as diffusion and phase field method to predict microstructure evolutions in response to external conditions)and macroscopic finite analysis method to deal with mechanical properties.Two applications are also presented to address the new routes to carry out materials design,especially for advanced steels.
基金supported by Spanish Ministry of Innovation and Science through REALIDAD:Gestion,Analisis y Explotacion Eficiente de Datos Vinculados under Grant No.TIN2011-25840
文摘The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of techniques and tools for addressing this problem, however, the complex nature of the matching problem make existing solutions for real situations not fully satisfactory. The Google Similarity Distance has appeared recently. Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions. Our work consists of developing a software application for validating results discovered by schema and ontolog2/ matching tools using the philosophy behind this distance. Moreover, we are interested in using not only Google, but other popular search engines with this similarity distance. The results reveal three main facts. Firstly, some web search engines can help us to validate semantic correspondences satisfactorily. Secondly there are significant differences among the web search engines. And thirdly the best results are obtained when using combinations of the web search engines that we have studied.
基金Supported by the National Natural Science Foundation of China ( No. 40871212, No. 40671158), the Leading Academic Discipline Project of Shang- hai Educational Committee(No.J50104).
文摘The way we interact with spatial data has been changed from 2D map to 3D Virtual Geographic Environment (VGE). Three-dimensional representations of geographic information on a computer are known as VGE, and in particular 3D city models provide an efficient way to integrate massive, heterogenous geospatial information and georeferenced information in urban areas. 3D city modeling (3DCM) is an active research and practice topic in distinct application areas. This paper intro- duces different modeling paradigms employed in 3D GIS, virtual environment, and AEC/FM. Up-to-date 3DCM technologies are evolving into a data integration and collaborative approach to represent the full spatial coverage of a city, to model both aboveground and underground, outdoor and indoor environments including man-made objects and natural features with 3D geometry, appearance, topology and semantics.
基金supported by the National High Technology Research and Development Program of China(2012AA020201)National Basic Research Program of China(2013CB910802,2010CB912700)+2 种基金International Science&Technology Cooperation Program of China(2014DFB30020)National Natural Science Foundation of China(31000379,31000587,31000591)Chinese State Key Project Specialized for Infectious Diseases(2012ZX10002012-006)
文摘Integration of pathway and protein-protein interaction(PPI) data can provide more information that could lead to new biological insights. PPIs are usually represented by a simple binary model, whereas pathways are represented by more complicated models. We developed a series of rules for transforming protein interactions from pathway to binary model, and the protein interactions from seven pathway databases, including PID, Bio Carta, Reactome, Net Path, INOH, SPIKE and KEGG, were transformed based on these rules. These pathway-derived binary protein interactions were integrated with PPIs from other five PPI databases including HPRD, Int Act, Bio GRID, MINT and DIP, to develop integrated dataset(named Path PPI). More detailed interaction type and modification information on protein interactions can be preserved in Path PPI than other existing datasets. Comparison analysis results indicate that most of the interaction overlaps values(OAB) among these pathway databases were less than 5%, and these databases must be used conjunctively. The Path PPI data was provided at http://proteomeview. hupo.org.cn/Path PPI/Path PPI.html.