摘要
该文以阿勒泰语系下的维哈柯及蒙古语多语言平行文本和语音语料为研究对象,分别对比多语言文本量化序列向量及语音声学音律特征的相似度,研究语言信息间存在的相通性。试验发现,同语系同语族黏着语言相似度较高:文本相似性达85%;声频特征相似性达95%。从而确认在同语系多种黏着语言间创建语言信息共享云模的可行性,这将有利于实现语言文本及语音信息的跨语言转换处理,极大降低少数民族语言信息处理成本。
In this paper, an investigation is done for the similarity between the same family and agglutinative langua- ges (such as Altai family languages ,for example, Uyghur, Kazakh, Kyrgyz and Mongolian using different countries and areas ). Cosine similarity measure is used to calculate the similarity using the parallel texts and the acoustic fea tures extracted from the same content speech sentences spoken by the different language speakers. Experimental results show that the transformation is more feasible by word to word units when learning the connection rule of a stem and an affix (function words) between languages by word level and common acoustic models. Thus, this avoids the uphill work of MT for the resource-deficient languages such as minority languages being used in the developing countries. Additionally, the costs can be reduced.
出处
《中文信息学报》
CSCD
北大核心
2013年第6期180-186,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(61163030)