With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over...With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.展开更多
To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integr...To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.展开更多
How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learnin...How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.展开更多
Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying o...Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not sufficient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limitations of treating individual matchers in different matching tasks equally for query interface matching, which reduces matching performance. This paper proposes a novel query interface matching approach based on extended evidence theory for Deep Web. Our approach firstly introduces the dynamic prediction procedure of different matchers' credibilities. Then, it extends traditional evidence theory with the credibilities and uses exponentially weighted evidence theory to combine the results of multiple matchers. Finally, it performs matching decision in terms of some heuristics to obtain the final matches. Our approach overcomes the shortage of traditional method and can adapt to different matching tasks. Experimental results demonstrate the feasibility and effectiveness of our proposed approach.展开更多
基金Supportted by the Natural Science Foundation ofChina (60573091 ,60273018) National Basic Research and Develop-ment Programof China (2003CB317000) the Key Project of Minis-try of Education of China (03044) .
文摘With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.
基金Supported by the National Natural Science Foundation of China (60573091)the Natural Science Foundation of Beijing(4073035)the Key Project of Ministry of Education of China (03044)
文摘To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.
基金Supported by the National Defense Pre-ResearchFoundation of China(4110105018)
文摘How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.
基金Supported by the National Natural Science Foundation of China under Grant No. 90818001the Natural Science Foundation of Shandong Province of China under Grant No. Y2007G24
文摘Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not sufficient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limitations of treating individual matchers in different matching tasks equally for query interface matching, which reduces matching performance. This paper proposes a novel query interface matching approach based on extended evidence theory for Deep Web. Our approach firstly introduces the dynamic prediction procedure of different matchers' credibilities. Then, it extends traditional evidence theory with the credibilities and uses exponentially weighted evidence theory to combine the results of multiple matchers. Finally, it performs matching decision in terms of some heuristics to obtain the final matches. Our approach overcomes the shortage of traditional method and can adapt to different matching tasks. Experimental results demonstrate the feasibility and effectiveness of our proposed approach.