期刊文献+

Statistical Language Model for Chinese Text Proofreading

Statistical Language Model for Chinese Text Proofreading
下载PDF
导出
摘要 Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w_i and w_j in linguistic environment(LE). First, the word association degree between w_i and w_j is defined by using the distance-weighted factor, w_j is l words apart from w_i in the LE, then Bayes formula is used to calculate the LE related degree of word w_i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w_i that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved. Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w_i and w_j in linguistic environment(LE). First, the word association degree between w_i and w_j is defined by using the distance-weighted factor, w_j is l words apart from w_i in the LE, then Bayes formula is used to calculate the LE related degree of word w_i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w_i that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.
出处 《Journal of Beijing Institute of Technology》 EI CAS 2003年第4期441-445,共5页 北京理工大学学报(英文版)
基金 theYouthFundofScienceandTechnologyofShanxiProvince ( 2 0 0 2 10 15 )
关键词 statistical language model N-GRAM linguistic environment text proofreading statistical language model n-gram linguistic environment text proofreading
  • 相关文献

参考文献1

二级参考文献10

  • 1王晓龙,王开铸.声音语句输入的研究[J].计算机学报,1994,17(2):96-103. 被引量:7
  • 2慕勇 孙才 等.汉语文本自动查错与确认纠错系统的研究.计算语言学进展与应用[M].北京:清华大学出版社,1995..
  • 3郭志立.中文校对系统中的修改建议提供算法.第四届计算语言学会议论文集[M].北京:清华大学出版社,1997..
  • 4姜兴海.基于统计的中文文本校错系统的研究与实现[硕士论文].哈尔滨:哈尔滨工业大学,1998..
  • 5姜兴海,硕士论文,1998年
  • 6郭志立,全国第四届计算语言学联合学术会议论文集,1997年
  • 7慕勇,计算语言学进展与应用,1995年,100页
  • 8Zhang Zhaohuang,Commun COLIPS,1994年,14卷,2期,143页
  • 9傅祖芸,信息论基础,1989年
  • 10于勐,姚天顺.一种混合的中文文本校对方法[J].中文信息学报,1998,12(2):31-36. 被引量:23

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部