Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method ...Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method for Chinese English parallel corpus, which differs from previous length based algorithm in its knowledge-rich approach. Experimental result shows that this method produces over 93% accuracy with usual English-Chinese dictionaries whose translations cover 31 88%~47 90% of the corpus.展开更多
On the basis of description of the necessity in construction of the Jiangxi red tourism resource E-C/C-E bilingual parallel corpus, this paper discusses the design and construction of the corpus. In its design, it des...On the basis of description of the necessity in construction of the Jiangxi red tourism resource E-C/C-E bilingual parallel corpus, this paper discusses the design and construction of the corpus. In its design, it describes the general design and the framework of the corpus, then it describes its construction including data collection, the standard for the sorted data, data selection, data digitalization, data tagging and data aligning. With the construction, it will not only realize purposes and functions of the corpus, but also provide others with ways or means to use the corpus and to establish such kind of corpus.展开更多
Objective:To investigate the neural electrophysiologieal activity underlying Chinese and Eng- lish Stroop tasks for Chinese English bilinguals.Methods:Event-related potentials(ERPs)were recorded in 14 Chinese bilingua...Objective:To investigate the neural electrophysiologieal activity underlying Chinese and Eng- lish Stroop tasks for Chinese English bilinguals.Methods:Event-related potentials(ERPs)were recorded in 14 Chinese bilinguals with a moderate command of English when they performed the Stroop task pre- sented in English words and Chinese characters,respectively.Results:In Chinese task version,it was found an increased positivity over bilateral front-polar regions on incongruent trials compared with congru- ent trials,followed by an increased negativity over fronto-central region and an increased positivity over occipital region.While in English task version,only the increased negativity was observed over fronto-cen- tral region,but with reduced amplitude and anterior distribution.Conclusion:This increased negativity was proposed as an index of the resolution processes of conflicting information in the incongruent situa- tion.The increased positivity over occipital region on Chinese incongruent trials may indicate visually rechecking effect for Chinese character.展开更多
The performance of a machine translation system heavily depends on the quantity and quality of the bilingual language resource. However,getting a parallel corpus,which has a large scale and is of high quality,is a ver...The performance of a machine translation system heavily depends on the quantity and quality of the bilingual language resource. However,getting a parallel corpus,which has a large scale and is of high quality,is a very difficult task especially for low resource languages such as Chinese-Vietnamese. Fortunately,multilingual user generated contents( UGC),such as bilingual movie subtitles,provide us access to automatic construction of the parallel corpus. Although the amount of UGC parallel corpora can be considerable,the original corpus is not suitable for statistical machine translation( SMT) systems. The corpus may contain translation errors,sentence mismatching,free translations,etc. To improve the quality of the bilingual corpus for SMT systems,three filtering methods are proposed: sentence length difference,the semantic of sentence pairs,and machine learning. Experiments are conducted on the Chinese to Vietnamese translation corpus.Experimental results demonstrate that all the three methods effectively improve the corpus quality,and the machine translation performance( BLEU score) can be improved by 1. 32.展开更多
The processing of relative clauses receives much concern from linguists. The finding that object relatives are easier to process than subiect relatives in Chinese challenges the notion that subject relative clauses ar...The processing of relative clauses receives much concern from linguists. The finding that object relatives are easier to process than subiect relatives in Chinese challenges the notion that subject relative clauses are preferred universally. A large body of literature provides theories related to sentence processing mechanisms for native speakers but leaves one area relatively untouched: how bilinguals process sentences. This study is designed to examine the case where the individuals with a Chinese L1 language background process subject-extracted subject relative clauses (SS) and subject-extracted object relative clauses (SO) by using event- related potentials (ERPs) to probe into the real-time language processing and presents a direct manifestation of brain activity. The findings from this study support the subject relative clause preference due to the strong influence of English relative clause markedness and bilinguals' relative lower working memory capacity.展开更多
文摘Parallel corpus is of great importance to machine translation, and automatic sentence alignment is the first step towards its processing. This paper puts forward a bilingual dictionary based sentence alignment method for Chinese English parallel corpus, which differs from previous length based algorithm in its knowledge-rich approach. Experimental result shows that this method produces over 93% accuracy with usual English-Chinese dictionaries whose translations cover 31 88%~47 90% of the corpus.
文摘On the basis of description of the necessity in construction of the Jiangxi red tourism resource E-C/C-E bilingual parallel corpus, this paper discusses the design and construction of the corpus. In its design, it describes the general design and the framework of the corpus, then it describes its construction including data collection, the standard for the sorted data, data selection, data digitalization, data tagging and data aligning. With the construction, it will not only realize purposes and functions of the corpus, but also provide others with ways or means to use the corpus and to establish such kind of corpus.
文摘Objective:To investigate the neural electrophysiologieal activity underlying Chinese and Eng- lish Stroop tasks for Chinese English bilinguals.Methods:Event-related potentials(ERPs)were recorded in 14 Chinese bilinguals with a moderate command of English when they performed the Stroop task pre- sented in English words and Chinese characters,respectively.Results:In Chinese task version,it was found an increased positivity over bilateral front-polar regions on incongruent trials compared with congru- ent trials,followed by an increased negativity over fronto-central region and an increased positivity over occipital region.While in English task version,only the increased negativity was observed over fronto-cen- tral region,but with reduced amplitude and anterior distribution.Conclusion:This increased negativity was proposed as an index of the resolution processes of conflicting information in the incongruent situa- tion.The increased positivity over occipital region on Chinese incongruent trials may indicate visually rechecking effect for Chinese character.
基金Supported by the National Basic Research Program of China(973Program)(2013CB329303)the National Natural Science Foundation of China(61502035)
文摘The performance of a machine translation system heavily depends on the quantity and quality of the bilingual language resource. However,getting a parallel corpus,which has a large scale and is of high quality,is a very difficult task especially for low resource languages such as Chinese-Vietnamese. Fortunately,multilingual user generated contents( UGC),such as bilingual movie subtitles,provide us access to automatic construction of the parallel corpus. Although the amount of UGC parallel corpora can be considerable,the original corpus is not suitable for statistical machine translation( SMT) systems. The corpus may contain translation errors,sentence mismatching,free translations,etc. To improve the quality of the bilingual corpus for SMT systems,three filtering methods are proposed: sentence length difference,the semantic of sentence pairs,and machine learning. Experiments are conducted on the Chinese to Vietnamese translation corpus.Experimental results demonstrate that all the three methods effectively improve the corpus quality,and the machine translation performance( BLEU score) can be improved by 1. 32.
基金This work was supported through the National Social Science Foundation of China (13BYY072).
文摘The processing of relative clauses receives much concern from linguists. The finding that object relatives are easier to process than subiect relatives in Chinese challenges the notion that subject relative clauses are preferred universally. A large body of literature provides theories related to sentence processing mechanisms for native speakers but leaves one area relatively untouched: how bilinguals process sentences. This study is designed to examine the case where the individuals with a Chinese L1 language background process subject-extracted subject relative clauses (SS) and subject-extracted object relative clauses (SO) by using event- related potentials (ERPs) to probe into the real-time language processing and presents a direct manifestation of brain activity. The findings from this study support the subject relative clause preference due to the strong influence of English relative clause markedness and bilinguals' relative lower working memory capacity.