The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were t...The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were translated by two machine translation systems on the Language Gridem. The queries were then expanded using an online dictionary to translate compound words or word phrases. A concept base was used to compare back translation words with the original query in order to delete mistranslated words. In order to evaluate the proposed method, we constructed a CLIR system and used the science documents of the NTCIR1 dataset. The proposed method achieved high precision. However~ proper nouns (names of people and places) appear infrequently in science documents. In information retrieval, proper nouns present unique problems. Since proper nouns are usually unknown words, they are difficult to find in monolingual dictionaries, not to mention bilingual dictionaries. Furthermore, the initial query of the user is not always the best description of the desired information. In order to solve this problem, and to create a better query representation, query expansion is often proposed as a solution. Wikipedia was used to translate compound words or word phrases. It was also used to expand queries together with a concept base. The NTCIRI and NTCIR 6 datasets were used to evaluate the proposed method. In the proposed method, the CLIR system was implemented with a high rate of precision. The proposed syst had a higher ranking than the NTCIRI and NTCIR6 participation systems.展开更多
文摘The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were translated by two machine translation systems on the Language Gridem. The queries were then expanded using an online dictionary to translate compound words or word phrases. A concept base was used to compare back translation words with the original query in order to delete mistranslated words. In order to evaluate the proposed method, we constructed a CLIR system and used the science documents of the NTCIR1 dataset. The proposed method achieved high precision. However~ proper nouns (names of people and places) appear infrequently in science documents. In information retrieval, proper nouns present unique problems. Since proper nouns are usually unknown words, they are difficult to find in monolingual dictionaries, not to mention bilingual dictionaries. Furthermore, the initial query of the user is not always the best description of the desired information. In order to solve this problem, and to create a better query representation, query expansion is often proposed as a solution. Wikipedia was used to translate compound words or word phrases. It was also used to expand queries together with a concept base. The NTCIRI and NTCIR 6 datasets were used to evaluate the proposed method. In the proposed method, the CLIR system was implemented with a high rate of precision. The proposed syst had a higher ranking than the NTCIRI and NTCIR6 participation systems.