摘要
藏族人名的性别自动识别是自然语言处理中非常重要的基础性问题之一.文章中提出了融合音节特征的SVM模型藏族人名性别识别方法,采用支持向量机(SVM)模型为基本框架,依据藏族人名的构词特征和统计分析,设计了特征模板,使支持向量机模型有效地处理藏族人名性别识别问题.实验结果表明,在包含18 821个藏族人名的103974个句子中,随机抽取3 764个藏族人名作为测试语料,对SVM的高斯核、线性核、多项式核和S型核等4个常用核函数做了实验,性别自动识别的准确率分别达到99.98%、98.81%、96.98%和95.45%.
The automatic identification of Tibetan people's name is one of the fundamental and important issues in natural language processing.This paper presents a syllabic characteristics fused gender identification method of Tibetan people's name.This method used support vector machine(SVM)model as the basic framework and designed characteristic template according to the formation feature of Tibetan name and statistical analysis.Then the support vector machine(SVM)model can effectively deal with the gender identification problem of Tibetan name.Experimental results show that depend on a random sample of 3764 Tibetan names as test corpus in 18821 Tibetan names from103974 sentences,we tested four commonly functions which include the gauss kernel,linear kernel,polynomial kernel and S type nuclear.The automatic gender identification accuracy respectively reached 99.98%,98.81%,98.81% and 99.98%.
出处
《西北民族大学学报(自然科学版)》
2017年第3期1-5,共5页
Journal of Northwest Minzu University(Natural Science)
基金
青海省科技计划项目(2017-GX-146)
青海师范大学中青年科研基金项目(17ZR11)
关键词
藏族人名
性别识别
音节特征
支持向量机(SVM)
Tibetan people’s names
Genderidenti fication
Syllable characteristic
Supportvector machine (SVM )