期刊文献+

中文分词与词性标注研究 被引量:48

Study on Word Segmentation and Part- of- speech Tagging
下载PDF
导出
摘要 分词和词性标注是中文语言处理的重要技术,广泛应用于语义理解、机器翻译、信息检索等领域。在搜集整理当前分词和词性标注研究与应用成果的基础上,对中文分词和词性标注的基本方法进行了分类和探讨。首先在分词方面,对基于词典的和基于统计的方法进行了详细介绍,并且列了三届分词竞赛的结果;其次在词性标注方面,分别对基于规则的方法和基于统计的方法进行了阐述;接下来介绍了中文分词和词性标注一体化模型相关方法。此外还分析了各种分词和词性标注方法的优点和不足,在此基础上,为中文分词和词性标注的进一步发展提供了建议。 Word segmentation and Part-Of-Speech (POS) tagging are the basic task of the CLP (Chinese Language Processing) and are widely applied in the semantic understanding, machine translation, information retrieval and other fields. In this paper, based on collecting current research and application results of word segmentation and part-of-speech tagging, analyze and classify the basic methods of Chi- nese Word Segmentation (CWS) and POS tagging. First in terms of word segmentation,dictionary-based segmentation method and sta- tistics-based segmentation method were introduced in detail and some word segmentation results of the competition were also listed. Secondly in terms of POS tagging, rule-based method and statistics-based method were expounded. Next, the main methods of building the model for joint CWS and POS tagging were presented. In this paper,also analyze the advantages and disadvantages for methods of CWS and POS tagging, based on which suggestions for the further development are put forward.
作者 梁喜涛 顾磊
出处 《计算机技术与发展》 2015年第2期175-180,共6页 Computer Technology and Development
基金 国家自然科学基金资助项目(61302157) 教育部人文社会科学研究青年基金(12YJC870008) 江苏省教育高校哲学社会科学基金(2013SJB870004) 江苏省社科研究文化精品课题(12SWC-030)
关键词 中文分词 主动学习 词性标注 自然语言处理 一体化模型 Chinese word segmentation active learning POS tagging CLP joint model
  • 相关文献

参考文献15

二级参考文献146

共引文献261

同被引文献417

引证文献48

二级引证文献156

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部