Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in prac...Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.展开更多
A new tagging method is presented to build a Chinese semantic corpus. The method characterizes the sentence meaning as a linear sequence of dependency relationships which are the semantic or syntactic relationships b...A new tagging method is presented to build a Chinese semantic corpus. The method characterizes the sentence meaning as a linear sequence of dependency relationships which are the semantic or syntactic relationships between words in the sentence. This representation method is used to build a Chinese statistical parser model to understand the sentence meaning. Specific experiments on automatic telephone switchboard conversations show that the proposed parser has a precision of 80%. This work provides a foundation for building a large-scale Chinese semantic corpus and for research on understanding modeling of the Chinese language.展开更多
This paper presents two language models that utilize a Chinese semantic dependency parsing technique for speech recognition. The models are based on a representation of the Chinese semantic structure with dependency r...This paper presents two language models that utilize a Chinese semantic dependency parsing technique for speech recognition. The models are based on a representation of the Chinese semantic structure with dependency relations, A semantic dependency parser was described to automatically tag the semantic class for each word with 90.9% accuracy and parse the sentence semantic dependency structure with 75.8% accuracy. The Chinese semantic parsing technique was applied to structure language models to develop two language models, the semantic dependency model (SDM) and the headword trigram model (HTM). These language models were evaluated using Chinese speech recognition. The experiments show that both models outperform the word trigram model in terms of the Chinese character recognition error rate.展开更多
基金Supported by the National Natural Science Foundation of China (No.60435020).
文摘Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
基金Supported by the National High- Technology DevelopmentProgram of China(No. 863 - 3 0 6- 2 D0 3 - 0 1- 2)
文摘A new tagging method is presented to build a Chinese semantic corpus. The method characterizes the sentence meaning as a linear sequence of dependency relationships which are the semantic or syntactic relationships between words in the sentence. This representation method is used to build a Chinese statistical parser model to understand the sentence meaning. Specific experiments on automatic telephone switchboard conversations show that the proposed parser has a precision of 80%. This work provides a foundation for building a large-scale Chinese semantic corpus and for research on understanding modeling of the Chinese language.
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 2004AA114011-2)
文摘This paper presents two language models that utilize a Chinese semantic dependency parsing technique for speech recognition. The models are based on a representation of the Chinese semantic structure with dependency relations, A semantic dependency parser was described to automatically tag the semantic class for each word with 90.9% accuracy and parse the sentence semantic dependency structure with 75.8% accuracy. The Chinese semantic parsing technique was applied to structure language models to develop two language models, the semantic dependency model (SDM) and the headword trigram model (HTM). These language models were evaluated using Chinese speech recognition. The experiments show that both models outperform the word trigram model in terms of the Chinese character recognition error rate.