Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retriev...The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.展开更多
Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems...Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.展开更多
The use of agent technology in a dynamic environment is rapidly growing as one of the powerful technologies and the need to provide the benefits of the Intelligent Information Agent technique to massive open online co...The use of agent technology in a dynamic environment is rapidly growing as one of the powerful technologies and the need to provide the benefits of the Intelligent Information Agent technique to massive open online courses, is very important from various aspects including the rapid growing of MOOCs environments, and the focusing more on static information than on updated information. One of the main problems in such environment is updating the information to the needs of the student who interacts at each moment. Using such technology can ensure more flexible information, lower waste time and hence higher earnings in learning. This paper presents Intelligent Topic-Based Information Agent to offer an updated knowledge including various types of resource for students. Using dominant meaning method, the agent searches the Internet, controls the metadata coming from the Internet, filters and shows them into a categorized content lists. There are two experiments conducted on the Intelligent Topic-Based Information Agent: one measures the improvement in the retrieval effectiveness and the other measures the impact of the agent on the learning. The experiment results indicate that our methodology to expand the query yields a considerable improvement in the retrieval effectiveness in all categories of Google Web Search API. On the other hand, there is a positive impact on the performance of learning session.展开更多
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
文摘The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.
文摘Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.
文摘The use of agent technology in a dynamic environment is rapidly growing as one of the powerful technologies and the need to provide the benefits of the Intelligent Information Agent technique to massive open online courses, is very important from various aspects including the rapid growing of MOOCs environments, and the focusing more on static information than on updated information. One of the main problems in such environment is updating the information to the needs of the student who interacts at each moment. Using such technology can ensure more flexible information, lower waste time and hence higher earnings in learning. This paper presents Intelligent Topic-Based Information Agent to offer an updated knowledge including various types of resource for students. Using dominant meaning method, the agent searches the Internet, controls the metadata coming from the Internet, filters and shows them into a categorized content lists. There are two experiments conducted on the Intelligent Topic-Based Information Agent: one measures the improvement in the retrieval effectiveness and the other measures the impact of the agent on the learning. The experiment results indicate that our methodology to expand the query yields a considerable improvement in the retrieval effectiveness in all categories of Google Web Search API. On the other hand, there is a positive impact on the performance of learning session.