摘要
通过藏文格助词的接续、结构以及上下文特征,提出基于规则、支持向量机、还原法等三层混合模式的藏文音节切分方法.藏文音节切分是藏文字频统计、分词、词性标注和机器翻译等研究领域的基础,其中藏文紧缩格歧义现象的正确识别、切分和还原是藏文音节切分的难点.经实验,混合模式藏文音节切分的F值为99.97%.
A Tibetan syllable segmentation method based on mixed mode of rules,support vector machine,restoration method was proposed through the analysis of case-auxiliary words and contextual features of Tibetan in this paper.The Tibetan syllable segmentation is the basis of many research fields such as Tibetan character frequency statistics,word segmentation,part-of-speech tagging and machine translation.Moreover,the correct identification,segmentation and restoration of Tibetan ambiguity case-auxiliary words are difficult points in Tibetan syllable segmentation.The experiment result showed that the F-measure score of 99.97%was obtained by using mixed mode Tibetan syllable segmentation.
作者
才让当知
华却才让
却措卓玛
夏吾吉
Cairangdangzhi;Huaquecairang;Quezuozhuoma;XIA Wu-ji(The Com puter College of Qinghai Normal University,Xining 810016,China;Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province,Xining 810008,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Xining 810008,China)
出处
《内蒙古师范大学学报(自然科学汉文版)》
CAS
2019年第5期406-412,共7页
Journal of Inner Mongolia Normal University(Natural Science Edition)
基金
国家社科基金资助项目(17XYY030)
青海省科技计划项目(2017-GX-146)
青海师范大学中青年科研基金项目(17ZR11)
青海省重点实验室项目(2013-Z-Y17,2014-Z-Y32,2015-Z-Y03)
藏文信息处理与机器翻译重点实验室(2013-Y-17)
关键词
音节特征
紧缩格
歧义紧缩格
支持向量机
syllable characteristic
abbreviated case-auxiliary words
ambiguity abbreviated case-auxiliary words
SVM