Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method ...Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method for Chinese English parallel corpus, which differs from previous length based algorithm in its knowledge-rich approach. Experimental result shows that this method produces over 93% accuracy with usual English-Chinese dictionaries whose translations cover 31 88%~47 90% of the corpus.展开更多
Sentence alignment is a basic task in natural lan-guage processing which aims to extract high-quality paral-lel sentences automatically.Motivated by the observation that aligned sentence pairs contain a larger number ...Sentence alignment is a basic task in natural lan-guage processing which aims to extract high-quality paral-lel sentences automatically.Motivated by the observation that aligned sentence pairs contain a larger number of aligned words than unaligned ones,we treat word translation as one of the most useful external knowledge.In this paper,we show how to explicitly integrate word translation into neural sentence alignment.Specifically,this paper proposes three cross-lingual encoders to incorporate word translation:1)Mixed Encoder that learns words and their translation annotation vectors over sequences where words and their translations are mixed alterma-tively;2)Factored Encoder that views word translations as fea-tures and encodes words and their translations by concatenating their embeddings;and 3)Gated Encoder that uses gate mechanism to selectively control the amount of word translations moving forward.Experimentation on NIST MT and Opensub-titles Chinese-English datasets on both non-monotonicity and monotonicity scenarios demonstrates that all the proposed encoders significantly improve sentence alignment performance.展开更多
Sentence alignment provides multi-lingual or cross-lingual natural language processing(NLP)applications with high-quality parallel sentence pairs.Normally,an aligned sentence pair contains multiple aligned words,which...Sentence alignment provides multi-lingual or cross-lingual natural language processing(NLP)applications with high-quality parallel sentence pairs.Normally,an aligned sentence pair contains multiple aligned words,which intuitively play different roles during sentence alignment.Inspired by this intuition,we propose to deal with the problem of sentence alignment by exploring the semantic interactionship among fine-grained word pairs within the framework of neural network.In particular,we first employ various relevance measures to capture various kinds of semantic interactions among word pairs by using a word-pair relevance network,and then model their importance by using a multi-view attention network.Experimental results on both monotonic and non-monotonic bitexts show that our proposed approach significantly improves the performance of sentence alignment.展开更多
文摘Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method for Chinese English parallel corpus, which differs from previous length based algorithm in its knowledge-rich approach. Experimental result shows that this method produces over 93% accuracy with usual English-Chinese dictionaries whose translations cover 31 88%~47 90% of the corpus.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61876120,61673290).
文摘Sentence alignment is a basic task in natural lan-guage processing which aims to extract high-quality paral-lel sentences automatically.Motivated by the observation that aligned sentence pairs contain a larger number of aligned words than unaligned ones,we treat word translation as one of the most useful external knowledge.In this paper,we show how to explicitly integrate word translation into neural sentence alignment.Specifically,this paper proposes three cross-lingual encoders to incorporate word translation:1)Mixed Encoder that learns words and their translation annotation vectors over sequences where words and their translations are mixed alterma-tively;2)Factored Encoder that views word translations as fea-tures and encodes words and their translations by concatenating their embeddings;and 3)Gated Encoder that uses gate mechanism to selectively control the amount of word translations moving forward.Experimentation on NIST MT and Opensub-titles Chinese-English datasets on both non-monotonicity and monotonicity scenarios demonstrates that all the proposed encoders significantly improve sentence alignment performance.
基金The work was supported by the National Natural Science Foundation of China under Grant Nos.61876120,61751206,and 61673290.
文摘Sentence alignment provides multi-lingual or cross-lingual natural language processing(NLP)applications with high-quality parallel sentence pairs.Normally,an aligned sentence pair contains multiple aligned words,which intuitively play different roles during sentence alignment.Inspired by this intuition,we propose to deal with the problem of sentence alignment by exploring the semantic interactionship among fine-grained word pairs within the framework of neural network.In particular,we first employ various relevance measures to capture various kinds of semantic interactions among word pairs by using a word-pair relevance network,and then model their importance by using a multi-view attention network.Experimental results on both monotonic and non-monotonic bitexts show that our proposed approach significantly improves the performance of sentence alignment.