An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods ...An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods such as MI-based rule evaluating, weighted rule quantification and element-based n-gram probability approximation are presented. Dynamic Viterbi algorithm is adopted to search the best path in lattice. To strengthen the model, transformation-based error-driven rules learning is adopted. Applying proposed model to Chinese Pinyin-to-character conversion, high performance has been achieved in accuracy, flexibility and robustness simultaneously. Tests show correct rate achieves 94.81% instead of 90.53% using bi-gram Markov model alone. Many long-distance dependency and recursion in language can be processed effectively.展开更多
This paper puts forward and explores the problem of empty element(EE)recovery in Chinese from the syntactic parsing perspective,which has been largely ignored in the literature.First,we demonstrate why EEs play a crit...This paper puts forward and explores the problem of empty element(EE)recovery in Chinese from the syntactic parsing perspective,which has been largely ignored in the literature.First,we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective.Then,we propose two ways to automatically recover EEs:a joint constituent parsing approach and a chunk-based dependency parsing approach.Evaluation on the Chinese TreeBank(CTB)5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in F 1-measure.To the best of our knowledge,this is the first close examination of EEs in syntactic parsing of Chinese,which deserves more attention in the future with regard to its specific importance.展开更多
文摘An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods such as MI-based rule evaluating, weighted rule quantification and element-based n-gram probability approximation are presented. Dynamic Viterbi algorithm is adopted to search the best path in lattice. To strengthen the model, transformation-based error-driven rules learning is adopted. Applying proposed model to Chinese Pinyin-to-character conversion, high performance has been achieved in accuracy, flexibility and robustness simultaneously. Tests show correct rate achieves 94.81% instead of 90.53% using bi-gram Markov model alone. Many long-distance dependency and recursion in language can be processed effectively.
基金Supported by the National Natural Science Foundation of China under Grant Nos.61273320,61331011,61070123the National High Technology Research and Development 863 Program of China under Grant No.2012AA011102
文摘This paper puts forward and explores the problem of empty element(EE)recovery in Chinese from the syntactic parsing perspective,which has been largely ignored in the literature.First,we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective.Then,we propose two ways to automatically recover EEs:a joint constituent parsing approach and a chunk-based dependency parsing approach.Evaluation on the Chinese TreeBank(CTB)5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in F 1-measure.To the best of our knowledge,this is the first close examination of EEs in syntactic parsing of Chinese,which deserves more attention in the future with regard to its specific importance.