To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integr...To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.展开更多
Deep web data integration needs to do schema matching on web query interfaces and obtain the mapping table.By introducing semantic conflicts into web query interface integration and discussing the origins and categori...Deep web data integration needs to do schema matching on web query interfaces and obtain the mapping table.By introducing semantic conflicts into web query interface integration and discussing the origins and categories of the semantic conflicts,an ontology-based schema matching method is proposed.The process of the method is explained in detail using the example of web query interface integration in house domain.Conflicts can be detected automatically by checking semantic relevance degree,then the categories of the conflicts are identified and messages are sent to the conflict solver,which eliminates the conflicts and obtains the mapping table using conflict solving rules.The proposed method is simple,easy to implement and can be flexibly reused by extending the ontology to different domains.展开更多
With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over...With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.展开更多
The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research area...The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research areas, including Web mining and data warehouse, Deep Web and Web integration, P2P networks, text processing and information retrieval, as well as Web Services and Web infrastructure. After briefly introducing the WISA conference, the survey outlines the current activities and future trends concerning Web information systems and applications based on the papers accepted for publication.展开更多
How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learnin...How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.展开更多
Inferring the fully qualified names(FQNs)of undeclared receiving objects and non-fully-qualified type names(non-FQNs)in partial code is critical for effectively searching,understanding,and reusing partial code.Existin...Inferring the fully qualified names(FQNs)of undeclared receiving objects and non-fully-qualified type names(non-FQNs)in partial code is critical for effectively searching,understanding,and reusing partial code.Existing type inference tools,such as COSTER and SNR,rely on a symbolic knowledge base and adopt a dictionary-lookup strategy to map simple names of undeclared receiving objects and non-FQNs to FQNs.However,building a symbolic knowledge base requires parsing compilable code files,which limits the collection of APIs and code contexts,resulting in out-of-vocabulary(OOV)failures.To overcome the limitations of a symbolic knowledge base for FQN inference,we implemented Ask Me Any Type(AMAT),a type of inference plugin embedded in web browsers and integrated development environment(IDE).Unlike the dictionary-lookup strategy,AMAT uses a cloze-style fill-in-the-blank strategy for type inference.By treating code as text,AMAT leverages a fine-tuned large language model(LLM)as a neural knowledge base,thereby preventing the need for code compilation.Experimental results show that AMAT outperforms state-of-the-art tools such as COSTER and SNR.In practice,developers can directly reuse partial code by inferring the FQNs of unresolved type names in real time.展开更多
Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying o...Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not sufficient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limitations of treating individual matchers in different matching tasks equally for query interface matching, which reduces matching performance. This paper proposes a novel query interface matching approach based on extended evidence theory for Deep Web. Our approach firstly introduces the dynamic prediction procedure of different matchers' credibilities. Then, it extends traditional evidence theory with the credibilities and uses exponentially weighted evidence theory to combine the results of multiple matchers. Finally, it performs matching decision in terms of some heuristics to obtain the final matches. Our approach overcomes the shortage of traditional method and can adapt to different matching tasks. Experimental results demonstrate the feasibility and effectiveness of our proposed approach.展开更多
基金Supported by the National Natural Science Foundation of China (60573091)the Natural Science Foundation of Beijing(4073035)the Key Project of Ministry of Education of China (03044)
文摘To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.
基金The National Natural Science Foundation of China(No.60673130)the Natural Science Foundation of Shandong Province(No.Y2006G29,Y2007G24,Y2007G38)the Encouragement Fund for Young Scholars of Shandong Province(No.2005BS01002)
文摘Deep web data integration needs to do schema matching on web query interfaces and obtain the mapping table.By introducing semantic conflicts into web query interface integration and discussing the origins and categories of the semantic conflicts,an ontology-based schema matching method is proposed.The process of the method is explained in detail using the example of web query interface integration in house domain.Conflicts can be detected automatically by checking semantic relevance degree,then the categories of the conflicts are identified and messages are sent to the conflict solver,which eliminates the conflicts and obtains the mapping table using conflict solving rules.The proposed method is simple,easy to implement and can be flexibly reused by extending the ontology to different domains.
基金Supportted by the Natural Science Foundation ofChina (60573091 ,60273018) National Basic Research and Develop-ment Programof China (2003CB317000) the Key Project of Minis-try of Education of China (03044) .
文摘With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.
文摘The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research areas, including Web mining and data warehouse, Deep Web and Web integration, P2P networks, text processing and information retrieval, as well as Web Services and Web infrastructure. After briefly introducing the WISA conference, the survey outlines the current activities and future trends concerning Web information systems and applications based on the papers accepted for publication.
基金Supported by the National Defense Pre-ResearchFoundation of China(4110105018)
文摘How to integrate heterogeneous semi-structured Web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of Web pages from diverse domains show the novel approach's effectiveness.
基金Supported by the Key Scientific and Technological Research Projects of the Jiangxi Provincial Department of Education(GJJ2200303)the National Social Science Foundation Major Bidding Project(20&ZD068)。
文摘Inferring the fully qualified names(FQNs)of undeclared receiving objects and non-fully-qualified type names(non-FQNs)in partial code is critical for effectively searching,understanding,and reusing partial code.Existing type inference tools,such as COSTER and SNR,rely on a symbolic knowledge base and adopt a dictionary-lookup strategy to map simple names of undeclared receiving objects and non-FQNs to FQNs.However,building a symbolic knowledge base requires parsing compilable code files,which limits the collection of APIs and code contexts,resulting in out-of-vocabulary(OOV)failures.To overcome the limitations of a symbolic knowledge base for FQN inference,we implemented Ask Me Any Type(AMAT),a type of inference plugin embedded in web browsers and integrated development environment(IDE).Unlike the dictionary-lookup strategy,AMAT uses a cloze-style fill-in-the-blank strategy for type inference.By treating code as text,AMAT leverages a fine-tuned large language model(LLM)as a neural knowledge base,thereby preventing the need for code compilation.Experimental results show that AMAT outperforms state-of-the-art tools such as COSTER and SNR.In practice,developers can directly reuse partial code by inferring the FQNs of unresolved type names in real time.
基金Supported by the National Natural Science Foundation of China under Grant No. 90818001the Natural Science Foundation of Shandong Province of China under Grant No. Y2007G24
文摘Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not sufficient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limitations of treating individual matchers in different matching tasks equally for query interface matching, which reduces matching performance. This paper proposes a novel query interface matching approach based on extended evidence theory for Deep Web. Our approach firstly introduces the dynamic prediction procedure of different matchers' credibilities. Then, it extends traditional evidence theory with the credibilities and uses exponentially weighted evidence theory to combine the results of multiple matchers. Finally, it performs matching decision in terms of some heuristics to obtain the final matches. Our approach overcomes the shortage of traditional method and can adapt to different matching tasks. Experimental results demonstrate the feasibility and effectiveness of our proposed approach.