摘要
基于统计的词义消歧模型的一个关键问题是如何自动从语料库中获取指示词 ,虽然通过学习初始搭配实例能够在语料库中获取更多的搭配知识 ,但人工获取质量较好的初始搭配是比较困难的 ,并且无法保证有效的扩大搭配知识。针对该问题 ,提出了通过机器学习初始搭配实例获取最优种子 ,再由最优种子扩增更多指示词 ,最后利用这些指示词实现具有多个义项的多义词消歧。采用该方法对 8个多义词进行消歧的测试实验中取得了 87 7%的平均正确率。
The key problem of word sense disambiguation based on statistic model lies in how to acquiring the word sense indicators automatically. Although it is feasible to acquire a large number of collocations by learning examples, it is hard to select good seeds manually to increase new collocations effectively. The method of selecting the best seeds by machine learning is provided in this paper to solve this problem. The best seeds are used to augment more new word sense indicators; finally disambiguate polysemous words with the acquired indicators. The average accuracy is 87.7% for 8 polysemous words by this method.
出处
《中文信息学报》
CSCD
北大核心
2005年第1期30-35,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目 (10 0 710 2 8)
国家语言文字应用委员会语言文字应用"十五"科研项目
关键词
人工智能
自然语言处理
自然语言处理
词义消歧
搭配
种子优选
artificial intelligence
natural language processing
natural language processing
word sense disambiguation
collocation
select seeds