摘要
为研究在给定上下文中如何确定多义词的词义,介绍了一种无指导的词义消歧技术和一个汉语全文词义标注系统的设计实现过程.该系统基于贝叶斯模型,使用大规模语料进行训练,较好地解决了知识获取中数据稀疏的问题.该系统具有标注正确率高和运行速度快等特点,适合大规模文本的词义标注工作.
Word sense disambiguation has been a very active research topic in the NLP field, which studies how to determine which of the senses of an ambiguous word is invoked in a particular context using sense classifiers. This paper presents a technique for unsupervised word sense disambiguation and implements the process of a full - text word sense tagging system. This system performs word sense disambiguation based on the Nave Bayesian Model, uses largescale corpora as training data, and it is able to preferentially conquer the problem of Sparse Data in Knowledge Acquisition. In addition, this system has the characteristics of high accuracy and quick running speed. Thus, this system is competent for word sense tagging on large - scale, real - word text.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2005年第12期1603-1605,1649,共4页
Journal of Harbin Institute of Technology
基金
国家自然科学基金资助重点项目(60435020)
关键词
词义
梢歧
自然语言处理
无指导学习算法
贝叶斯模型
依存文法
word sense disambiguation
natural language processing
unsupervised learning algorithm
Nave-Bayesian Model
dependency grammar