The meta search engines provide service to the users by dispensing the users' requests to the existing search engines. The existing search engines selected by meta search engine determine the searching quality. Be...The meta search engines provide service to the users by dispensing the users' requests to the existing search engines. The existing search engines selected by meta search engine determine the searching quality. Because the performance of the existing search engines and the users' requests are changed dynamically, it is not favorable for the fixed search engines to optimize the holistic performance of the meta search engine. This paper applies the genetic algorithm (GA) to realize the scheduling strategy of agent manager in our meta search engine, GSE(general search engine), which can simulate the evolution process of living things more lively and more efficiently. By using GA, the combination of search engines can be optimized and hence the holistic performance of GSE can be improved dramatically.展开更多
Search engines have greatly helped us to find the desired information from the Internet. Most search engines use keywords matching technique. This paper discusses a Dynamic Knowledge Base based Search Engine (DKBSE)...Search engines have greatly helped us to find the desired information from the Internet. Most search engines use keywords matching technique. This paper discusses a Dynamic Knowledge Base based Search Engine (DKBSE), which can expand the user's query using the keywords' concept or meaning. To do this, the DKBSE needs to construct and maintain the knowledge base dynamically via the system's searching results and the user's feedback information. The DKBSE expands the user's initial query using the knowledge base, and returns the searched information after the expanded query.展开更多
1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文...1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。展开更多
The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existi...The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existing result merging methods, usually suffered a great influence from the usefulness weight of different IRRS results and overlap rate among them. In this paper, we proposed a scheme that being capable of coalescing and optimizing a group of existing multi-sources-retrieval merging results effectively by Discrete Particle Swarm Optimization (DPSO). The experimental results show that the DPSO, not only can overall outperform all the other result merging algorithms it employed, but also has better adaptability in application for unnecessarily taking into account different IRRS's usefulness weight and their overlap rate with respect to a concrete query. Compared to other result merging algorithms it employed, the DPSO's recognition precision can increase nearly 24.6%, while the precision standard deviation for different queries can decrease about 68.3%.展开更多
Cloud Computing and in particular cloud services have become widely used in both the technology and business industries. Despite this significant use, very little research or commercial solutions exist that focus on t...Cloud Computing and in particular cloud services have become widely used in both the technology and business industries. Despite this significant use, very little research or commercial solutions exist that focus on the discovery of cloud services. This paper introduces CSRecommender—a search engine and recommender system specifically designed for the discovery of these services. To engineer the system to scale, we also describe the implementation of a Cloud Service Identifier which enables the system to crawl the Internet without human involvement. Finally, we examine the effectiveness and usefulness of the system using real-world use cases and users.展开更多
The information access is the rich data available for information retrieval, evolved to provide principle approaches or strategies for searching. For building the successful web retrieval search engine model, there ar...The information access is the rich data available for information retrieval, evolved to provide principle approaches or strategies for searching. For building the successful web retrieval search engine model, there are a number of prospects that arise at the different levels where techniques, such as Usenet, support vector machine are employed to have a significant impact. The present investigations explore the number of problems identified its level and related to finding information on web. The authors have attempted to examine the issues and prospects by applying different methods such as web graph analysis, the retrieval and analysis of newsgroup postings and statistical methods for inferring meaning in text. The proposed model thus assists the users in finding the existing formation of data they need. The study proposes three heuristics model to characterize the balancing between query and feedback information, so that adaptive relevance feedback. The authors have made an attempt to discuss the parameter factors that are responsible for the efficient searching. The important parameters can be taken care of for the future extension or development of search engines.展开更多
The use of agent technology in a dynamic environment is rapidly growing as one of the powerful technologies and the need to provide the benefits of the Intelligent Information Agent technique to massive open online co...The use of agent technology in a dynamic environment is rapidly growing as one of the powerful technologies and the need to provide the benefits of the Intelligent Information Agent technique to massive open online courses, is very important from various aspects including the rapid growing of MOOCs environments, and the focusing more on static information than on updated information. One of the main problems in such environment is updating the information to the needs of the student who interacts at each moment. Using such technology can ensure more flexible information, lower waste time and hence higher earnings in learning. This paper presents Intelligent Topic-Based Information Agent to offer an updated knowledge including various types of resource for students. Using dominant meaning method, the agent searches the Internet, controls the metadata coming from the Internet, filters and shows them into a categorized content lists. There are two experiments conducted on the Intelligent Topic-Based Information Agent: one measures the improvement in the retrieval effectiveness and the other measures the impact of the agent on the learning. The experiment results indicate that our methodology to expand the query yields a considerable improvement in the retrieval effectiveness in all categories of Google Web Search API. On the other hand, there is a positive impact on the performance of learning session.展开更多
Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to me...Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to merge the results from multiple member search engines and provide a more effective rank list.This paper deals with a novel fusion model employing supervised learning.Our fusion model employs ranking SVM in training the fusion weight for each member search engine. We assume the fusion weight of each member search engine as a feature of a result document returned by the meta-search engine. For a returned result document,we first build a feature vector to represent the document,and set the value of each feature as the document's score returned by the corresponding member search engine.Then we construct a training set from the documents returned from the meta-search engine to learn the fusion parameter.Finally,we use the linear fusion model based on the overlap set to merge the results set.Experimental results show that our approach significantly improves the performance of the cross-media meta-search(MSearch) and outperforms many of the existing fusion methods.展开更多
Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link predic...Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.展开更多
基金Supported in part by the National Natural Science F oundation of China(NSFC) (6 0 0 730 12 )
文摘The meta search engines provide service to the users by dispensing the users' requests to the existing search engines. The existing search engines selected by meta search engine determine the searching quality. Because the performance of the existing search engines and the users' requests are changed dynamically, it is not favorable for the fixed search engines to optimize the holistic performance of the meta search engine. This paper applies the genetic algorithm (GA) to realize the scheduling strategy of agent manager in our meta search engine, GSE(general search engine), which can simulate the evolution process of living things more lively and more efficiently. By using GA, the combination of search engines can be optimized and hence the holistic performance of GSE can be improved dramatically.
文摘Search engines have greatly helped us to find the desired information from the Internet. Most search engines use keywords matching technique. This paper discusses a Dynamic Knowledge Base based Search Engine (DKBSE), which can expand the user's query using the keywords' concept or meaning. To do this, the DKBSE needs to construct and maintain the knowledge base dynamically via the system's searching results and the user's feedback information. The DKBSE expands the user's initial query using the knowledge base, and returns the searched information after the expanded query.
文摘1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。
基金Supported by the National Natural Science Foundation of China (No. 90818007)
文摘The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existing result merging methods, usually suffered a great influence from the usefulness weight of different IRRS results and overlap rate among them. In this paper, we proposed a scheme that being capable of coalescing and optimizing a group of existing multi-sources-retrieval merging results effectively by Discrete Particle Swarm Optimization (DPSO). The experimental results show that the DPSO, not only can overall outperform all the other result merging algorithms it employed, but also has better adaptability in application for unnecessarily taking into account different IRRS's usefulness weight and their overlap rate with respect to a concrete query. Compared to other result merging algorithms it employed, the DPSO's recognition precision can increase nearly 24.6%, while the precision standard deviation for different queries can decrease about 68.3%.
文摘Cloud Computing and in particular cloud services have become widely used in both the technology and business industries. Despite this significant use, very little research or commercial solutions exist that focus on the discovery of cloud services. This paper introduces CSRecommender—a search engine and recommender system specifically designed for the discovery of these services. To engineer the system to scale, we also describe the implementation of a Cloud Service Identifier which enables the system to crawl the Internet without human involvement. Finally, we examine the effectiveness and usefulness of the system using real-world use cases and users.
文摘The information access is the rich data available for information retrieval, evolved to provide principle approaches or strategies for searching. For building the successful web retrieval search engine model, there are a number of prospects that arise at the different levels where techniques, such as Usenet, support vector machine are employed to have a significant impact. The present investigations explore the number of problems identified its level and related to finding information on web. The authors have attempted to examine the issues and prospects by applying different methods such as web graph analysis, the retrieval and analysis of newsgroup postings and statistical methods for inferring meaning in text. The proposed model thus assists the users in finding the existing formation of data they need. The study proposes three heuristics model to characterize the balancing between query and feedback information, so that adaptive relevance feedback. The authors have made an attempt to discuss the parameter factors that are responsible for the efficient searching. The important parameters can be taken care of for the future extension or development of search engines.
文摘The use of agent technology in a dynamic environment is rapidly growing as one of the powerful technologies and the need to provide the benefits of the Intelligent Information Agent technique to massive open online courses, is very important from various aspects including the rapid growing of MOOCs environments, and the focusing more on static information than on updated information. One of the main problems in such environment is updating the information to the needs of the student who interacts at each moment. Using such technology can ensure more flexible information, lower waste time and hence higher earnings in learning. This paper presents Intelligent Topic-Based Information Agent to offer an updated knowledge including various types of resource for students. Using dominant meaning method, the agent searches the Internet, controls the metadata coming from the Internet, filters and shows them into a categorized content lists. There are two experiments conducted on the Intelligent Topic-Based Information Agent: one measures the improvement in the retrieval effectiveness and the other measures the impact of the agent on the learning. The experiment results indicate that our methodology to expand the query yields a considerable improvement in the retrieval effectiveness in all categories of Google Web Search API. On the other hand, there is a positive impact on the performance of learning session.
基金Project supported by the National Natural Science Foundation of China(No.60605020)the National High-Tech R&D Program (863) of China(Nos.2006AA01Z320 and 2006AA010105)
文摘Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to merge the results from multiple member search engines and provide a more effective rank list.This paper deals with a novel fusion model employing supervised learning.Our fusion model employs ranking SVM in training the fusion weight for each member search engine. We assume the fusion weight of each member search engine as a feature of a result document returned by the meta-search engine. For a returned result document,we first build a feature vector to represent the document,and set the value of each feature as the document's score returned by the corresponding member search engine.Then we construct a training set from the documents returned from the meta-search engine to learn the fusion parameter.Finally,we use the linear fusion model based on the overlap set to merge the results set.Experimental results show that our approach significantly improves the performance of the cross-media meta-search(MSearch) and outperforms many of the existing fusion methods.
基金supported in part by the U.S.Army Research Laboratory under Cooperative Agreement No.W911NF-09-2-0053(NS-CTA),NSF ⅡS-0905215,CNS-09-31975MIAS,a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC
文摘Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.