Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best ca...Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms oc- curring in the largest possible number of documents where the query keywords appear; (2) proximity, where more im- portance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria si- multaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the re- trieval performance as compared to the baseline.展开更多
文摘Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms oc- curring in the largest possible number of documents where the query keywords appear; (2) proximity, where more im- portance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria si- multaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the re- trieval performance as compared to the baseline.