In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve ...In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.展开更多
To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
One of important reasons caused low precision was presented, which was due to inaccurate express of the query. So a new method of automatic query expansion based on tolerance rough was put forward. In the algorithm, t...One of important reasons caused low precision was presented, which was due to inaccurate express of the query. So a new method of automatic query expansion based on tolerance rough was put forward. In the algorithm, the uncertain connection between query terms and retrial documents was described as term tolerance class. The upper approximation set of query sentence was considered as query expansion. The new additional terms were also given weight numbers. The results of experiment on collection of Google 5 000 Web pages showed that the approach was effective on query expansion and high search precision was gained.展开更多
The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retriev...The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.展开更多
The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolvi...The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts(EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine(SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge(QECK) by 13%-16% in terms of precision when the first query result is inspected.展开更多
在基于语义的查询扩展中,为了找到描述查询需求语义的相关概念,词语.概念相关度的计算是语义查询扩展中的关键一步.针对词语.概念相关度的计算,提出一种K2CM(keyword to concept method)方法.K2CM方法从词语.文档.概念所属程度和词语....在基于语义的查询扩展中,为了找到描述查询需求语义的相关概念,词语.概念相关度的计算是语义查询扩展中的关键一步.针对词语.概念相关度的计算,提出一种K2CM(keyword to concept method)方法.K2CM方法从词语.文档.概念所属程度和词语.概念共现程度两个方面来计算词语.概念相关度问语.文档.概念所属程度来源于标注的文档集中词语对概念的所属关系,即词语出现在若干文档中而文档被标注了若干概念.词语.概念共现程度是在词语概念对的共现性基础上增加了词语概念对的文本距离和文档分布特征的考虑.3种不同类型数据集上的语义检索实验结果表明,与传统方法相比,基于K2CM的语义查询扩展可以提高查询效果.展开更多
基金the Specialized Research Program Fundthe Doctoral Program of Higher Education of China (20050007023)the Natural Science Foundation of Shandong Province(Y2004G04)
文摘In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
基金Supported by the National Natural ScienceFoundation of China(60403027)
文摘One of important reasons caused low precision was presented, which was due to inaccurate express of the query. So a new method of automatic query expansion based on tolerance rough was put forward. In the algorithm, the uncertain connection between query terms and retrial documents was described as term tolerance class. The upper approximation set of query sentence was considered as query expansion. The new additional terms were also given weight numbers. The results of experiment on collection of Google 5 000 Web pages showed that the approach was effective on query expansion and high search precision was gained.
文摘The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.
基金Supported by the Science and Technology Project of Jiangxi Education Department(GJJ161151)the School-Level Team Building Project(JXTD1404)
文摘The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts(EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine(SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge(QECK) by 13%-16% in terms of precision when the first query result is inspected.
文摘在基于语义的查询扩展中,为了找到描述查询需求语义的相关概念,词语.概念相关度的计算是语义查询扩展中的关键一步.针对词语.概念相关度的计算,提出一种K2CM(keyword to concept method)方法.K2CM方法从词语.文档.概念所属程度和词语.概念共现程度两个方面来计算词语.概念相关度问语.文档.概念所属程度来源于标注的文档集中词语对概念的所属关系,即词语出现在若干文档中而文档被标注了若干概念.词语.概念共现程度是在词语概念对的共现性基础上增加了词语概念对的文本距离和文档分布特征的考虑.3种不同类型数据集上的语义检索实验结果表明,与传统方法相比,基于K2CM的语义查询扩展可以提高查询效果.