Human-centric service is an important domain in smart city and includes rich applications that help residents with shopping, dining, transportation, entertainment, and other daily activities. These applications have g...Human-centric service is an important domain in smart city and includes rich applications that help residents with shopping, dining, transportation, entertainment, and other daily activities. These applications have generated a massive amount of hierarchical data with different schemas. In order to manage and analyze the city-wide and cross-application data in a unified way, data schema integration is necessary. However, data from human-centric services has some distinct characteristics, such as lack of support for semantic, matching, large number of schemas, and incompleteness of schema element labels. These make the schema integra- tion difficult using existing approaches. We propose a novel framework for the data schema integration of the human-centric services in smart city. The framework uses both schema metadata and instance data to do schema matching, and introduces human intervention based on a similarity entropy criteria to balance precision and efficiency. Moreover, the framework works in an incremental manner to reduce computation workload. We conduct an experiment with real-world dataset collected from multiple estate sale application systems. The results show that our approach can produce high-quality mediated schema with relatively less human in- terventions compared to the baseline method.展开更多
When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an efficient it...When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an efficient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t, the ones manually defined by an expert.展开更多
Panorama is a multidatabase system (MDBS) devel oped in HUST. The project aims to achieve interoperability among existing, heter ogeneous, federated database management systems such as Oracle8, Sybase, and DM2 (A dat...Panorama is a multidatabase system (MDBS) devel oped in HUST. The project aims to achieve interoperability among existing, heter ogeneous, federated database management systems such as Oracle8, Sybase, and DM2 (A database management system developed at HUST, Wuhan, China). This system is based on OMG's distributed object management architecture and it is implemented on top of CORBA compliant, namely VisiBroker, which is used as its infrastructu re. Panorama can provide its users a single common data model and a single globa l query language named PanoSQL, which make it possible to incorporate different databases into the system. The main component of this system are interfaces for the Local DBMSs that participate in Panorama, a transaction manager, a common da ta model, a schema integrator, a global query language, and a global query proce ssing and optimization. We first discuss the architecture and components of Panorama system. We also dis cuss the schema integration in this system. And we extend our discussion to the query language, transaction management, and the query processing developed for t his system. Finally, a conclusion and the future work for our designed system ha ve been given.展开更多
基金funded by the National High Technology Research and Development Program of China(863)under Grant No.2013AA01A605
文摘Human-centric service is an important domain in smart city and includes rich applications that help residents with shopping, dining, transportation, entertainment, and other daily activities. These applications have generated a massive amount of hierarchical data with different schemas. In order to manage and analyze the city-wide and cross-application data in a unified way, data schema integration is necessary. However, data from human-centric services has some distinct characteristics, such as lack of support for semantic, matching, large number of schemas, and incompleteness of schema element labels. These make the schema integra- tion difficult using existing approaches. We propose a novel framework for the data schema integration of the human-centric services in smart city. The framework uses both schema metadata and instance data to do schema matching, and introduces human intervention based on a similarity entropy criteria to balance precision and efficiency. Moreover, the framework works in an incremental manner to reduce computation workload. We conduct an experiment with real-world dataset collected from multiple estate sale application systems. The results show that our approach can produce high-quality mediated schema with relatively less human in- terventions compared to the baseline method.
文摘When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an efficient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t, the ones manually defined by an expert.
文摘Panorama is a multidatabase system (MDBS) devel oped in HUST. The project aims to achieve interoperability among existing, heter ogeneous, federated database management systems such as Oracle8, Sybase, and DM2 (A database management system developed at HUST, Wuhan, China). This system is based on OMG's distributed object management architecture and it is implemented on top of CORBA compliant, namely VisiBroker, which is used as its infrastructu re. Panorama can provide its users a single common data model and a single globa l query language named PanoSQL, which make it possible to incorporate different databases into the system. The main component of this system are interfaces for the Local DBMSs that participate in Panorama, a transaction manager, a common da ta model, a schema integrator, a global query language, and a global query proce ssing and optimization. We first discuss the architecture and components of Panorama system. We also dis cuss the schema integration in this system. And we extend our discussion to the query language, transaction management, and the query processing developed for t his system. Finally, a conclusion and the future work for our designed system ha ve been given.