期刊文献+

面向专业领域的中文分词方法 被引量:19

Domain specific Chinese word segmentation
下载PDF
导出
摘要 在专业领域分词任务中,基于统计的分词方法的性能受限于缺少专业领域的标注语料,而基于词典的分词方法在处理新词和歧义词方面还有待提高。针对专业领域分词的特殊性,提出统计与词典相结合的分词方法,完善领域词典构建流程,设计基于规则和字表的二次分词歧义消解方法。在工程法领域语料上进行分词实验。实验结果表明,在工程法领域的分词结果准确率为92.08%,召回率为94.26%,F值为93.16%。该方法还可与新词发现等方法结合,改善未登录词的处理效果。 The performance of statistical methods for Chinese word segmentation is limited owing to lack of the specific training corpus, and the dictionary-based methods are affected by unknown words and segmentation ambiguities. To realize domain adaptation, an approach combined statistical methods and a domain dictionary is developed. The approach firstly builds a high quality domain dictionary, and uses a statistical method to obtain preliminary results. Then, an algorithm for eliminating ambiguity is designed based on rules and Chinese character subsets with defined properties. Experimental results on a construction law domain corpus show that the precision, the recall and F-measure achieve 92.08%, 94.26%and 93.16%. The approach combined with new word detection can improve the performance of unknown words processing.
作者 成于思 施云涛 CHENG Yusi;SHI Yuntao(School of Civil Engineering,Southeast University,Nanjing 210096,China;Nanjing Branch Network Department,China Mobile Communications Group,Nanjing 210019,China)
出处 《计算机工程与应用》 CSCD 北大核心 2018年第17期30-34,109,共6页 Computer Engineering and Applications
基金 国家自然科学基金青年科学基金(No.71601047) 中国博士后科学基金(No.2015M581706)
关键词 中文分词 专业领域 歧义消解 领域词典 工程法 Chinese word segmentation domain specific ambiguity resolution domain dictionary construction law
  • 相关文献

参考文献13

二级参考文献134

共引文献526

同被引文献154

引证文献19

二级引证文献99

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部