The Chinese character library is one of the important data structures in the Chinese information Processing system.The behavior of the whole system depends directly on the reasonableness of design for its structure.Th...The Chinese character library is one of the important data structures in the Chinese information Processing system.The behavior of the whole system depends directly on the reasonableness of design for its structure.This paper expounds the structures of RAM-based Chinese character libraries,static and dynamic ,The paper offers a descriptive method for this behavior and inquires into some algorithms related to the structures mentioned above.展开更多
An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods ...An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods such as MI-based rule evaluating, weighted rule quantification and element-based n-gram probability approximation are presented. Dynamic Viterbi algorithm is adopted to search the best path in lattice. To strengthen the model, transformation-based error-driven rules learning is adopted. Applying proposed model to Chinese Pinyin-to-character conversion, high performance has been achieved in accuracy, flexibility and robustness simultaneously. Tests show correct rate achieves 94.81% instead of 90.53% using bi-gram Markov model alone. Many long-distance dependency and recursion in language can be processed effectively.展开更多
文摘The Chinese character library is one of the important data structures in the Chinese information Processing system.The behavior of the whole system depends directly on the reasonableness of design for its structure.This paper expounds the structures of RAM-based Chinese character libraries,static and dynamic ,The paper offers a descriptive method for this behavior and inquires into some algorithms related to the structures mentioned above.
文摘An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods such as MI-based rule evaluating, weighted rule quantification and element-based n-gram probability approximation are presented. Dynamic Viterbi algorithm is adopted to search the best path in lattice. To strengthen the model, transformation-based error-driven rules learning is adopted. Applying proposed model to Chinese Pinyin-to-character conversion, high performance has been achieved in accuracy, flexibility and robustness simultaneously. Tests show correct rate achieves 94.81% instead of 90.53% using bi-gram Markov model alone. Many long-distance dependency and recursion in language can be processed effectively.