期刊文献+

基于字的词位标注汉语分词 被引量:10

Word-position-based tagging for Chinese word segmentation
原文传递
导出
摘要 近年来基于字的词位标注方法极大地提高了汉语分词的性能,该方法将汉语分词转化为字的词位标注问题,借助于优秀的序列标注模型,基于字的词位标注汉语分词方法逐渐成为汉语分词的主要技术路线。该方法中特征模板选择至关重要,采用四词位标注集,使用条件随机场模型进一步研究基于字的词位标注汉语分词技术,在第三届和第四届国际汉语分词评测Bakeoff语料上进行封闭测试,并对比了不同特征模板集对分词性能的影响。实验表明采用的特征模板集:TMPT-10′较传统的特征模板集分词性能更好。 The performance of Chinese word segmentation has been greatly improved by the word-position-based approaches in recent years.This approach treated Chinese word segmentation as a word-position tagging problem.With the help of a powerful sequence tagging model,the word-position-based method could quickly rose as a mainstream technique in this field.Feature template selection was crucial in this method.This technique was further studied via using four word-positions and conditional random fields.Closed evaluations were performed on corpus from the third and the fourth international Chinese word segmentation Bakeoff,and comparative experiments were performed on different feature templates.Experimental results showed that the feature template set:TMPT-10' was much better performance than the traditional template set.
出处 《山东大学学报(工学版)》 CAS 北大核心 2010年第5期117-122,共6页 Journal of Shandong University(Engineering Science)
基金 高等学校博士学科点专项科研基金资助项目(20050007023)
关键词 汉语分词 条件随机场 词位标注 特征模板 Chinese word segmentation conditional random fields word-position tagging feature template
  • 相关文献

参考文献14

二级参考文献61

共引文献468

同被引文献58

引证文献10

二级引证文献105

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部