摘要
该文基于隐马尔可夫理论,提出了一种三元词汇概率和词性概率相结合的汉语词性标注模型,并对传统的Viterbi算法进行了扩展。对统计模型中出现的数据稀疏问题,给出了基于线性插值法的平滑算法。实验表明,完全二阶隐马尔可夫模型比标准的二元、三元模型有更高的词性标注正确率和消歧率。
This paper describes an extension to the hidden Markov model for Chinese part-of-speech tagging using second-order approximations for both contextual and lexical probabilities, as well as the traditional Viterbi algorithm is extended. The model makes use of more contextual information than standard statistical models. A smoothing algorithm based on the linear interpolation algorithm is introduced to solve the sparse data problem. The new full second-order HMM is proved to improve Chinese part-of-speech tagging accuracies and disambiguation accuracies over current models.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2005年第10期177-179,共3页
Computer Engineering
基金
国家自然科学基金资助项目