期刊文献+

MLWS2017中藏文分词评测的技术报告 被引量:2

Technical report on evaluation of Tibetan words in MLWS2017
下载PDF
导出
摘要 随着语言信息处理数据规模和自动分析处理需求的增长,以及国内外学术交流的不断增加,自然语言信息处理技术的公共评测对研究的促进作用已在业界形成共识。词法分析是语言信息处理的基础与关键。文章针对MLWS2017中有关藏文的情况介绍了藏文分词评测语料的收集、整理情况,分析了藏文新闻类语料计算机分词的难度;对藏文分词评测分析软件设计思想进行研究基础上,设计了藏文评测分析软件,对软件进行了各项测试;分析藏文分词评测的结果,并验证了评测结果的正确性。 With the analysis processing, evaluation of natural increase of the scale of language information processing data and the demand of automatic and the increasing of academic exchange at home and abroad, the promotion of the public language information processing technology has formed a consensus in the Academic community. Lexical analysis is the basis and key to the language information processing. National Language and Information Processing Committee and Computational Linguistics Committee of Chinese Information Processing Society of China jointly organized the Evaluation of "Minority Language Word Segmentation (MLWS2017)", which is an evaluation word segmentation technology of text in Mongolian, Tibetan, and Uygur news. In this paper we mainly focused on the collection and collation of Tibetan word segmentation in MLWS2017, and introduced and analyzed the difficulty of computer word segmentation of text in Tibetan news. On the basis of study on the idea of design Tibetan word segmentation evaluation software, we designed an analysis software of Tibetan word evaluation. Furthermore, the software was tested using different kinds of methods to ensure its correctness and the correctness of the evaluation results is verified by other methods.
作者 高定国
出处 《高原科学研究》 2017年第1期89-97,共9页 Plateau Science Research
基金 国家自然科学基金(61331013) 国家社会科学基金重大项目(15ZDB111) 西藏大学珠峰学者人才发展支持计划(藏大字[2015]96号)
关键词 MLWS2017 藏文信息处理 分词 藏文评测分析软件 MLWS2017 Tibetan information processing word segmentation analysis software of Tibetan word evaluation
  • 相关文献

同被引文献69

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部