期刊文献+

二元语法中文分词数据平滑算法性能研究 被引量:4

Performance of smoothing algorithm in Chinese word segmentation by bigram
下载PDF
导出
摘要 将多种平滑算法应用于基于二元语法的中文分词,在1998年1月人民日报语料库的基础上,讨论了困惑度和实际分词性能之间的关系,对比分析各平滑算法的实际性能,结果表明,简单的加值平滑算法性能最优,封闭精度、召回率分别为99.68%、99.7%,开放精度、召回率为98.64%、98.74%。 This paper discusses the relationships between complexity and real performance based on the corpus of People's Daily of January ,1998 , compares the performance of multiple smoothing algorithms.The result reveals that additive smoothing is the best with 99.68% on precision,99.7% on recall in close test,and 98.64% on precision,98.74% on recall in open test.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第17期33-36,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.70521001)~~
关键词 数据平滑 中文分词 二元语法 smoothing Chinese word segmentation bigram
  • 相关文献

参考文献8

  • 1黄建中,王肖雷.Katz平滑算法在中文分词系统中的应用[J].计算机工程,2004,30(B12):371-372. 被引量:5
  • 2Chen S F,Goodman J.An empirical study of smoothing techniques for language modeling[D].Cambridge:Harvard University, 1996.
  • 3Gale W A,Sampson G.Good turing frequency estimation without tears[J].Journal of Quantitative Linguistics, 1995,2(3).
  • 4Ney H,Essen U.Estimating small probabilities by leaving-one-out[C]// Proc Euro Speech, 1993:2239-2242.
  • 5Church K,Gale W.A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams[J].Computer Speech and Language, 1991,5( 1 ) : 19-54.
  • 6吴春颖,王士同.基于二元语法的N-最大概率中文粗分模型[J].计算机应用,2007,27(12):2902-2905. 被引量:12
  • 7Yuh J.An adaptive and learning control system for underwater robots[C]//13th World Congress International Federation of Automatic Control,San Francisco, 1996,A: 145-150.
  • 8张仰森,曹元大,俞士汶.语言模型复杂度度量与汉语熵的估算[J].小型微型计算机系统,2006,27(10):1931-1934. 被引量:7

二级参考文献22

  • 1孙茂松,邹嘉彦.汉语自动分词研究评述[J].当代语言学,2001,3(1):22-32. 被引量:101
  • 2陈小荷.用基于词的二元模型消解交集型分词歧义[J].南京师大学报(社会科学版),2004(6):109-113. 被引量:7
  • 3王峰,游志胜,曼丽春,高燕,汤丽萍.Dijkstra及基于Dijkstra的前N条最短路径算法在智能交通系统中的应用[J].计算机应用研究,2006,23(9):203-205. 被引量:40
  • 4陈小荷.现代汉语自动分析[M].北京:北京语言文化大学出版社,2000..
  • 5Fu Zu-yun.Foundations of information theory[M].Beijing:Publishing House of Electronics Industry,1989.
  • 6Gao Jun.The study and application of chinese language models[D].Beijing University of Posts and Telecommunications,1998,5.
  • 7Ronald Rosenfeld.A maximum entropy to adaptive statistical language learning[J].Computer Speech ang Language,1996,10(3):187-228.
  • 8Huang X,Alleva F,Hwang M Y,et al.An overview of the SPHINX-Ⅱ speech recognition system[C].Proc in DARPA Human Language Technology Workshop,Published as Human Language Technology,San Francisco,CA:Morgan Kaufmann.1993,3:81-86.
  • 9Joshua T.Goodman.A bit of progress in language modeling[J].Computer Speech and Language,2001,15(4):403-434.
  • 10Xu Zhi-ming,Wang Xiao-long,Guan Yi.The data smooth technology of N-gram language models[J].Application Research of Computers.1999,16(7):37-39.

共引文献21

同被引文献54

引证文献4

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部