摘要
以蒙古文人名识别为目的,实现了基于条件随机场模型的人名自动识别。从蒙古语黏着性特点分析入手,研究了蒙古语语料库中人名的存在形式以及各类人名的特点,针对蒙古语语料库中人名的特点,在词汇特征、词性特征和指示词特征等基本特征基础上引入了汉语姓氏特征、人名词典特征、兼类人名特征以及双词根特征。以内蒙古大学开发的100万词规模的标注语料库为训练数据,该模型的人名识别性能达到了94.56%的准确率,90.60%的召回率和92.54%的F值。该方法比起以往的基于规则的系统取得了较好的结果。
The paper presented a method to recognize Mongolian names based on conditional random fields (CRF). According to the characteristics of the Mongolian person names,it selected lexical features, part of speech, designation words, Chinese surname, names dictionary, category names and double roots as the features of the model. Using the 3rd-level annotated corpus with about 1000 000 words as the training data, this model achieved a accuracy of 94.56% , the recall rate of 90.60% , F score of 92.54%. The method achieves better results compared with the previous rule based system.
出处
《计算机应用研究》
CSCD
北大核心
2016年第7期2014-2017,共4页
Application Research of Computers
基金
内蒙古自治区蒙古语言文字信息化专项扶持项目(2012339)
国家自然科学基金资助项目(61070099)
国家社会科学基金资助项目(13XYY022)