摘要
未登录词的识别对于各种汉语处理系统不仅有直接的实用意义,而且起到基础性的作用。在大规模中文文本的自动分词中,未被识别的未登录词是造成分词错误的一个重要原因,也成为许多自动分词系统走向应用的瓶颈。首先对未登录词的研究现状及现有方法做了一个综合的介绍,分析了目前方案的利弊。在此基础上提出了一个基于框架结构的未登录词专有名词识别方法。
The identification of Chinese new word not only does great signification to different Chinese process systems, but also plays a foundational role. In Chinese word segmentation of large-scale text, unidentified new words are primary factor accounting for the errors and a bottleneck for the application of automatic segmentation, A survey of Chinese unknown word identification is presented, and the advantages and disadvantages of popular approaches are discussed. Based upon that, a frame structure based approach is proposed to discover proper noun.
出处
《计算机应用与软件》
CSCD
北大核心
2007年第8期213-215,共3页
Computer Applications and Software
关键词
专有名词识别
属性标注
错误驱动
规则和实例
Proper noun recognition Attribute tagging Error-driving Rules and instance