摘要
一直以来,汉语自动分词是公认的汉语信息处理瓶颈。反思现有汉语自动分词技术,发现均有隐舍两大假设:语言是规律的、词具有确定边界?这与语言的复杂性、组合性、动态性、模糊性特征不符。本文采用一种基于隐马尔科夫模型(HMM)的算法.通过CHMM(层叠形马尔科夫模型)进行分词,再做分层,既增加了分词的;隹确性,又保证了分词的效率。
All along, Chinese language automatic classifying words is universally acknowledged bottlenecks during processing Chinese language .There stand two concealing supposes .By introspecting existing current Chinese language automatic classifying technology .For languages have the character of regularity and words have their own determining frontier ,which don't accord with their complication ,compose ,tendency and indistinct. The paper provided a HMM-based arithmetic ,via CHMM to classify the words and then to divide layers once more. This way can assure the precise and efficiency of classifying the words.
作者
魏晓宁
Wei Xiao-ning (College of Computer Science&Technology, Nantong University,Nantong 226019,China)
出处
《电脑知识与技术》
2007年第11期885-886,共2页
Computer Knowledge and Technology