This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which...This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.展开更多
This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a...This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a major problem of translating into or from Korean. To solve this problem, we present a method to conduct a Korean morphological analysis by using a pre-analyzed partial word-phrase dictionary(PWD).Besides, we build a Korean-Vietnamese parallel corpus for training SMT models by collecting text from multilingual magazines. Then, we apply such a morphology analysis to Korean sentences that are included in the collected parallel corpus as a preprocessing step. The experiment results demonstrate a remarkable improvement of Korean-to-Vietnamese translation quality in term of bi-lingual evaluation understudy(BLEU).展开更多
Companies like Google, MSN and Yahoo provide translation services on their websites, generating translations based on statistical bilingual text corpora. Human translation seems to be inferior in face of huge amount o...Companies like Google, MSN and Yahoo provide translation services on their websites, generating translations based on statistical bilingual text corpora. Human translation seems to be inferior in face of huge amount of information and fast development of computer science. Despite the functions and versatility of statistical machine translation, it may never take the place of human effort. Teachers are supposed to guide the students in using online translation system.展开更多
Loose phrase extraction method is proposed and applied for phrase-based statistical ma- chine translation. The method extracts phrase pairs that are not strictly consistent with word align- ments. Two types of constra...Loose phrase extraction method is proposed and applied for phrase-based statistical ma- chine translation. The method extracts phrase pairs that are not strictly consistent with word align- ments. Two types of constraints on word positions are investigated for this method. Furthermore, n-best alignments are introduced for phrase extraction instead of the one-best. Experimental results show that the proposed approach outperforms the baseline system, Pharaoh system, for both one-best and n-best alignments.展开更多
基金National Natural Science Foundation of China ( No.60803078)National High Technology Research and Development Programs of China (No.2006AA010107, No.2006AA010108)
文摘This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.
基金supported by the Institute for Information&communications Technology Promotion under Grant No.R0101-16-0176the Project of Core Technology Development for Human-Like Self-Taught Learning Based on Symbolic Approach
文摘This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a major problem of translating into or from Korean. To solve this problem, we present a method to conduct a Korean morphological analysis by using a pre-analyzed partial word-phrase dictionary(PWD).Besides, we build a Korean-Vietnamese parallel corpus for training SMT models by collecting text from multilingual magazines. Then, we apply such a morphology analysis to Korean sentences that are included in the collected parallel corpus as a preprocessing step. The experiment results demonstrate a remarkable improvement of Korean-to-Vietnamese translation quality in term of bi-lingual evaluation understudy(BLEU).
文摘Companies like Google, MSN and Yahoo provide translation services on their websites, generating translations based on statistical bilingual text corpora. Human translation seems to be inferior in face of huge amount of information and fast development of computer science. Despite the functions and versatility of statistical machine translation, it may never take the place of human effort. Teachers are supposed to guide the students in using online translation system.
基金the High Technology Research and Develop-ment Program of China (No.2004AA117010-08).
文摘Loose phrase extraction method is proposed and applied for phrase-based statistical ma- chine translation. The method extracts phrase pairs that are not strictly consistent with word align- ments. Two types of constraints on word positions are investigated for this method. Furthermore, n-best alignments are introduced for phrase extraction instead of the one-best. Experimental results show that the proposed approach outperforms the baseline system, Pharaoh system, for both one-best and n-best alignments.