摘要
词性兼类是自动词性标注过程的关键所在,特别是确定未登录词词性的正确率对整个标注效果有很大的影响。对兼类词排歧方法进行了研究,针对统计和规则两种方法各自的优点和局限,提出运用隐马尔科夫模型和错误驱动学习方法相结合自动标注方法,最后介绍了如何通过这种方法在只有一个词库的有限条件下进行词性标注和未登录词的词性猜测。实验结果表明,该方法能有效提高未登录词词性标注的正确率。
Ambiguity of POS is the key of automatic part-of-speech tagging procedure.Especially,the correction of tagging unknown word greatly affects automatic POS results.Firstly the ambiguity of POS is studied.After comparing the advantage and weakness of the statistical methods and the rule-governed methods,an automatic POS tagging method based on both HMM and TBL is presented.Finally,how to complete part-of-speech tagging and guess the part-of-speech of the unknown words with a limited lexicon is shown.And the tests prove that the method can raise the accuracy of unknown words tagging.
出处
《计算机工程与设计》
CSCD
北大核心
2008年第6期1532-1534,共3页
Computer Engineering and Design
基金
天津市科技攻关重点基金项目(04310731R)
天津师范大学青年基金项目(52LE69)