期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Korean Morphological Analysis for Korean-Vietnamese Statistical Machine Translation
1
作者 Quang-Phuoc Nguyen Joon-Choul Shin Cheol-Young Ock 《Journal of Electronic Science and Technology》 CAS CSCD 2017年第4期413-419,共7页
This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a... This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a major problem of translating into or from Korean. To solve this problem, we present a method to conduct a Korean morphological analysis by using a pre-analyzed partial word-phrase dictionary(PWD).Besides, we build a Korean-Vietnamese parallel corpus for training SMT models by collecting text from multilingual magazines. Then, we apply such a morphology analysis to Korean sentences that are included in the collected parallel corpus as a preprocessing step. The experiment results demonstrate a remarkable improvement of Korean-to-Vietnamese translation quality in term of bi-lingual evaluation understudy(BLEU). 展开更多
关键词 Factored translation models Korean-Vietnamese parallel corpus morphological analysis statistical machine translation(SMT)
下载PDF
Research on high-performance English translation based on topic model
2
作者 Yumin Shen Hongyu Guo 《Digital Communications and Networks》 SCIE CSCD 2023年第2期505-511,共7页
Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based... Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation. 展开更多
关键词 Machine translation Topic model statistical machine translation Bilingual word vector RETELLING
下载PDF
A Substitution-Translation-Restoration Framework for Handling Unknown Words in Statistical Machine Translation 被引量:2
3
作者 张家俊 翟飞飞 宗成庆 《Journal of Computer Science & Technology》 SCIE EI CSCD 2013年第5期907-918,共12页
Unknown words are one of the key factors that greatly affect the translation quality. Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words. However, these approaches... Unknown words are one of the key factors that greatly affect the translation quality. Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words. However, these approaches have two disadvantages. On the one hand, they usually rely on many additional resources such as bilingual web data; on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words. This paper gives a new perspective on handling unknown words in statistical machine translation (SMT). Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process. In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated. In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model. Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality. 展开更多
关键词 statistical machine translation distributional semantics bidirectional language model
原文传递
Topic-aware pivot language approach for statistical machine translation
4
作者 Jin-song SU Xiao-dong SHI +4 位作者 Yan-zhou HUANG Yang LIU Qing-qiang WU Yi-dong CHEN Huai-lin DONG 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2014年第4期241-253,共13页
The pivot language approach for statistical machine translation(SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivotside c... The pivot language approach for statistical machine translation(SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivotside context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT. 展开更多
关键词 Natural language processing Pivot-based statistical machine translation Topical context information
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部