摘要
词性标注集是计算机处理自然语言时进行词类表示的工具。任何自然语言的词性标注都必须以词性标注集为基础。本文根据方块苗文信息化的实际需要,结合方块苗文的造字原理及词语的使用特点,在介绍对词性标注及标注集相关概念的基础上,参考汉语词性标注规范设计方法,基本确定了方块苗文的词性和种类,设计了方块苗文的词性标注符号和基于语法范畴的分类标记体系;初步制订了用于方块苗文信息处理领域的词性标注集,在某种意义上为方块苗文词性标注建立了参考标准。
The part-of-speech(POS)tag set is a tool for word class representation when the computer processes natural language.The POS tagging of any natural language must be based on a set of POS tags.According to the actual needs of the informatization of the square Hmong characters,combined with the word-making principle and the use characteristics of the words,the POS and type of the square Hmong characters are basically determined by referring to the design method of Chinese POS tagging specification.And then,the POS tagging symbol and the classification tagging system based on the grammatical category are designed.A more complete POS tagging set for the square Hmong characters information processing field is preliminarily developed.In a certain sense,the reference standard of the POS tagging for the square Hmong characters is established.
作者
周潭
莫礼平
曾虎
雷智
李文宇
吴莹
ZHOU Tan;MO Liping;ZENG Hu;LEI Zhi;LI Wenyu;WU Ying(College of Information Science&Engineering,JiShou University,Jishou Hunan 416000,China)
出处
《智能计算机与应用》
2019年第1期131-134,共4页
Intelligent Computer and Applications
基金
国家自然科学基金(61462029)
吉首大学本科生科研项目(JDX17027
2018JDX09)
大学生研究性学习和创新性实验计划项目(湘教通[2018]255号文件
599
吉首大学教通2018[15]号文件
JDCX2018012)
关键词
自然语言处理
方块苗文
词性标注
词性标注集
natural language processing
square Hmong character
part-of-speech tagging
part-of-speech tag set