期刊文献+

论语料库用现代蒙古文标注规范

On Contemporary Mongolian Segmentation Specification for Information Processing
下载PDF
导出
摘要 目前现代蒙古语语料库的标注虽然有《现代蒙古语语料库标注规范》为指导,但该规范在非蒙古文字符、专有名词、外来词的转写规则方面存在一定空白,非单字单位的标注问题尚未细化。本规范以信息处理用为目的,根据现代蒙古语的特点及规律,研究蒙古文语料库标记单位的合并或切分规则。本标注规范的研究遵循了中国电子技术标准化研究所等单位联合起草的《信息处理用蒙古文词语标记》,及内蒙古大学的《现代蒙古语语料库标注规范》。本研究今后需在大规模语料库基础上不断的完善。 Though the annotation for contemporary Mongolian corpus can de conducted under the guidance of Specifications for Contemporary Mongolian Corpus A nnotation ,there still is a blank for the transliteration of non-Mongolian characters ,proper nouns and loan words and there is no further division of the tagging of word units with more than one character in the specifications .For the purpose of information processing ,the specifi-cations study the rules of merging and segmentation for Mongolian corpus tagging units on the basis of Mongo -lian linguistic features and laws .The study follows the rules in Mongolian Word and Expression Marks for In-formation Processing jointly drafted by China Electronics Standardization Institution and other organizations and Specifications for Contemporary Mongolian Corpus Annotation drawn up by Mongolian University .It is sure that improvements need to make with the development of large -scale corpus in this study .
作者 通拉嘎
出处 《内蒙古民族大学学报(社会科学版)》 2014年第4期35-40,共6页 Journal of Inner Mongolia Minzu University:Social Sciences
基金 国家社会科学基金项目"中国少数民族语言互联网络发展状况的研究"(项目编号11CYY016)研究成果之一
关键词 信息处理 现代蒙古语 标注规范 Information processing contemporary Mongolian language Specification tokenization
  • 相关文献

参考文献6

二级参考文献6

共引文献135

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部