We propose a method that can achieve the Naxi-English bilingual word automatic alignment based on a log-linear model.This method defines the different Naxi-English structural feature functions,which are English-Naxi i...We propose a method that can achieve the Naxi-English bilingual word automatic alignment based on a log-linear model.This method defines the different Naxi-English structural feature functions,which are English-Naxi interval switching function and Naxi-English bilingual word position transformation function.With the manually labeled Naxi-English words alignment corpus,the parameters of the model are trained by using the minimum error,thus Naxi-English bilingual word alignment is achieved automatically.Experiments are conducted with IBM Model 3 as a benchmark,and the Naxi language constraints are introduced.The final experiment results show that the proposed alignment method achieves very good results:the introduction of the language characteristic function can effectively improve the accuracy of the Naxi-English Bilingual Word Alignment.展开更多
Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a lan...Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.展开更多
基金supported by the National Nature Science Foundation of China under Grants No.60863011,No.61175068,No.61100205,No.60873001the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212+1 种基金the National Innovation Fund for Technology-based Firms under Grant No.11C26215305905the Open Fund of Software Engineering Key Laboratory of Yunnan Province under Grant No.2011SE14
文摘We propose a method that can achieve the Naxi-English bilingual word automatic alignment based on a log-linear model.This method defines the different Naxi-English structural feature functions,which are English-Naxi interval switching function and Naxi-English bilingual word position transformation function.With the manually labeled Naxi-English words alignment corpus,the parameters of the model are trained by using the minimum error,thus Naxi-English bilingual word alignment is achieved automatically.Experiments are conducted with IBM Model 3 as a benchmark,and the Naxi language constraints are introduced.The final experiment results show that the proposed alignment method achieves very good results:the introduction of the language characteristic function can effectively improve the accuracy of the Naxi-English Bilingual Word Alignment.
文摘Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.