Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as...Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as bigram, trigram) become more and more popular today. This paper presents an improved Chinese word bigram, space-compressed Chinese word bigram, which stores the bi-word co-articulation frequency in the form of the bi-character co-articulation frequency. The bi-word co-articulation frequency is estimated from the bi-character co-articulation frequency library. The CPCC experiment with the improved Chinese word bigram shows: it can reach a higher correct conversion ratio with less space occupation.展开更多
基金Supported by the National Natural Science Foundation of China under Grant Nos.90412010 60573166 (国家自然科学基金)+1 种基金the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No.2007108 (高等学校博士学科点专项科研基金)the HP University Collaborative Foundation of China under Grant No.HLCFY08-001 (惠普大学合作基金)
文摘Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as bigram, trigram) become more and more popular today. This paper presents an improved Chinese word bigram, space-compressed Chinese word bigram, which stores the bi-word co-articulation frequency in the form of the bi-character co-articulation frequency. The bi-word co-articulation frequency is estimated from the bi-character co-articulation frequency library. The CPCC experiment with the improved Chinese word bigram shows: it can reach a higher correct conversion ratio with less space occupation.