摘要
语料库是一切自然语言处理的基础,尤其是在机器翻译、语音识别等应用的大趋势下,构建高质量、大规模、标准化的语料库尤为重要。民族语料库构建工作自20世纪八九十年代起,到目前已取得众多成果。文章主要对我国民族语料库的建设现状及相关研究进行介绍与评价,重点分析蒙语、维语、藏语语料库研究工作,并在此基础上,针对民族语料库构建存在的问题提几点建议,以期为其他少数民族构建民族语料库提供借鉴与参考。
The corpus is the basis of natural language processing, especially in the trend of applications such as machine translation and speech recognition. It is important to build high quality, massive, standardized corpus. Since the 1980 s and 1990 s, the construction of the national corpus has achieved many achievements. This paper analysis the research status of the national corpus, focusing on the Mongolian, Uyghur and Tibetan corpus. And then, this paper puts forward some suggestions for the problems existing in the construction of national corpus, so as to provide reference for other ethnic minorities to build national corpus.
作者
费德莲
袁凌云
权朝臣
Fei Delian;Yuan Lingyun;Quan Chaochen(Yunnan Normal University,Kunming 650500,China)
出处
《无线互联科技》
2019年第19期77-79,共3页
Wireless Internet Technology
关键词
少数民族语
语料库构建
蒙语
维语
藏语
minority nationality language
corpus construction
Mongolian
Uyghur
Tibetan