摘要
为降低现代汉语句法分析的难度,以北大和哈工大语料为基础,利用改进的Viterbi算法对汉语真实文本进行了短语识别研究。提出了在隐马尔可夫模型(HMM)框架下,训练阶段依据统计概率信息,以极大似然法获取HMM参数,识别阶段用一种改进的Viterbi算法进行动态规划,识别同层短语;在此基础上,运用逐层扫描算法和改进Viterbi算法相结合的方法来识别汉语嵌套短语。实验结果表明,识别正确率在封闭测试中可达93.52%,在开放测试中达到77.529%,证明该算法对短语识别问题具有良好的适应性和实用性。
To decrease the difficulty of syntax parsing, an improved Viterbi algorithm to recognize phrases in Chinese texts based on the corpus from Peking university and Harbin institute of technology is adopted. An efficient scheme for Chinese phrase recognition is pro- posed in the framework ofhidden Markov model, In the tagging system, statistics probability information and maximum likelihood estimation are used to get HMM parameters for training phase. An improved Viterbi algorithm for dynamic programming is presented to identify the same hierarchy phrase for identifying phase. Then the combination method of hierarchical syntax parsing and Viterbi algorithm is brought forward to identify those recursive phrases. The experimental results show that the precision rates of the phrase recognition in the closed test and the open test are 93.52 % and 77.529 % respectively, which proves that the algorithm has a better adaptability and practicability for phrase identification.
出处
《计算机工程与设计》
CSCD
北大核心
2007年第3期530-531,571,共3页
Computer Engineering and Design
基金
山西省忻州师范学院科研基金项目(200623)