期刊文献+

基于前后文n-gram模型的古汉语句子切分 被引量:25

Archaic Chinese Punctuating Sentences Based on Context N-gram Model
下载PDF
导出
摘要 提出了基于前后文n-gram模型的古汉语句子切分算法,该算法能够在数据稀疏的情况下,通过收集上下文信息,对切分位置进行比较准确的预测,从而较好地处理小规模训练语料的情况,降低数据稀疏对切分准确率的影响。采用《论语》对所提出的算法进行了句子切分实验,达到了81%的召回率和52%的准确率。 An algorithm of punctuating the sentences in archaic Chinese language based on context n-gram model is proposed in the paper. The algorithm can make comparatively accurate prediction of the punctuating-positions of the text under data-sparse instances by collecting and calculating context information to better analyze small-scaled corpus and meanwhile, to bring down the effects of the data-sparse plight on the global accuracy. At last, the paper selects the analects of Confucius ( Lunyu ) to test the algorithm introduced, and the results show that the recall and the precision achieve 81% and 52% respectively.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第3期192-193,196,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60073046) 高等学校博士学科点专项科研基金"SRFDP"资助项目(20020610007)
关键词 N-GRAM模型 数据稀疏 平滑技术 基于前后文的n-gram模型 N-gram model Data sparse Smoothing technology N-gram model based on context
  • 相关文献

参考文献6

  • 1Palmer,David D,Hearst,at al.Adaptive Multilingual Sentence Boundary Disambiguation[J].Computational Linguistics,1997,23(2).
  • 2Charoenpornsawat P,Sornlertlamvanich V.Automatic Sentence Break Disambiguation for Thai[C]//Proceedings of ICCPOL'01,Korea,2001:231-235.
  • 3Chen,Stanley F,Goodman J.An Empirical Study of Smoothing Techniques for Language Modeling[R].Center for Research in Computing Technology,Harvard University,Technical Report:TR-10-98,1998.
  • 4Lidstone G J.Note on the General Case of the Bayes-laplace Formula for Inductive or a Posteriori Probabilities[J].Transactions of the Faculty of Actuaries,1920,(8):182-192.
  • 5MANNING C D,SCHOTZE H.统计自然语言处理基础[M].苑春法,等译.北京:电子工业出版社,2005.
  • 6丁国栋.统计语言建模中的平滑技术[R].中科院计算所软件室LCC组,2004.

共引文献17

同被引文献236

引证文献25

二级引证文献144

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部