摘要
文章在Linux环境下搭建Srilm建模平台,然后对语料进行分块处理,并用N-gram count和N-gram进行计数和语言模型的建立,利用几种平滑算法对其进行了困惑度的测试,最后对这几个困惑度的数值进行比较和数据分析,总结出一个适用于当前语料和语言环境下最优的平滑方法.
This paper talked about theSrilm modeling platform is built in Linux environment,and then the corpus is processed in blocks.N-gram count and N-gram were utilized to count and build the language model,and several smoothing algorithms were applied to test the degree of confusion.Finally,the values of these degrees of confusion were compared and analyzed,and concluded an optimal smoothing method for the current corpus and language environment.
作者
仁青吉
REN Qing-ji(Tibetan Intangible Cultural Heritage Key Laboratory,Gansu Normal University for Nationalities,Hezuo,747000,China)
出处
《西北民族大学学报(自然科学版)》
2019年第4期26-30,共5页
Journal of Northwest Minzu University(Natural Science)
关键词
藏语语言模型
N-GRAM
平滑算法
困惑度
Tibetan language model
N-gram
Smoothing algorithms
Degrees of confusion