摘要
经过对最大熵模型的研究,找到一种适合汉语拼音标注的最大熵模型算法.利用该算法解决了多音字单字成词的情况,从而使得所有包含多音字的词都是两字或多字词.使用该算法随机抽取"读者文摘"中的一篇文章进行标注实验,实验表明拼音标注正确率达到了96.6%以上.
Through maximum entropy model study, a algorithm for maximum entropy model that is for pinyinmarked must be founded. Using the algorithm put an end to the situation that polyphone word is considered to be a word, so that all words with multiple pronunciations are two or more words. Using the algorithm mark the article in Reader's Digest, the results show that pinyin marked rate has reached 96.6 percent or more.
出处
《微电子学与计算机》
CSCD
北大核心
2012年第8期120-122,126,共4页
Microelectronics & Computer
基金
内蒙古工业大学科学研究项目(ZD201118)
关键词
最大熵模型
多音字
拼音标注
统计
特征
分词
maximum entropy model
polyphony
phonetic annotation
statistics
features
segmentation