摘要
本文结合藏文各类形态特征 ,首次提出了一种基于格助词和接续特征(BCCF ,BasedonCase auxiliarywordandContinuousFeature)的书面藏文自动分词方案。其总体技术特点是 :在格助词、接续特征、字性知识库以及词典支持下 ,进行逐级定位的确定性分词。初步测试表明 :这一方案在发现和消除切分歧义、解决未登录词问题 ,进而在提高藏文分词精度方面具有很高的实用价值。
This paper proposes a cascaded written Tibetan word segmentation scheme, which is based on case auxiliary words and continuous features. Using inflectional information such as case auxiliary words and continuous features and adopting a cascaded strategy are the key features of the proposed scheme. Preliminary experiments suggest that it could detect and eliminate segmentation ambiguities and deal with unknown words. The scheme has significant practical value in increasing the precision of segmentation.
出处
《语言文字应用》
CSSCI
北大核心
2003年第1期75-82,共8页
Applied Linguistics
基金
国家 8 6 3计划 (2 0 0 1AA114 0 4 0 )
973项目 (G19980 30 5 0 7 4 )资助
关键词
格助词
接续特征
藏文分词
case auxiliary words
continuous features
Tibetan word segmentation