摘要
Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
基金
Supported by the National Natural Science Foundation of China (No.60435020).