期刊文献+

基于树剪枝的典籍文本快速切分方法研究——以《茶经》的翻译为例 被引量:5

Tree Pruning Based Fast Segmentation of Classical Texts——A Case Study on "Classic of Tea"
下载PDF
导出
摘要 以《茶经》的翻译为例,基于树剪枝理论提出了一种典籍文本快速切分方法。首先,采用似然比统计量计算两字、三字甚至多字候选单元;然后在此基础上基于树剪枝的思想构建了典籍文本快速切分的模型算法,并构建了基本流程图;最后,以《茶经》为例验证了本算法的有效性和合理性。理论分析和算例表明,该算法能有效地对典籍文本进行自动切分,并简化了计算时间的复杂度,在推广中国典籍的对外传译方面具有良好的应用前景。 This study proposes a new fast segmentation method for classic Chinese texts based on the tree pruning process.Firstly,word candidates of two,three and multiple characters are selected with likelihood ratio statistics.Then an algorithm of fast segment is presented and a basic flow chart is illustrated.Finally,the Classic of Tea is used to verify its validity and effectiveness.The theoretical analysis and experimental instances show that the algorithm is effective and promising in computer-aided translation of classic Chinese texts.
出处 《中文信息学报》 CSCD 北大核心 2010年第6期10-13,42,共5页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60673039) 辽宁省教育厅2009年度高等学校科研项目计划资助项目(2009A139) 大连理工大学2008人文社科研究基金资助项目(DUTHS2008320)
关键词 切分 树剪枝 似然比 茶典籍 机辅翻译 segmentation tree pruning likelihood ratio The Classic of Tea computer-aided translation
  • 相关文献

参考文献15

  • 1张春霞,郝天永.汉语自动分词的研究现状与困难[J].系统仿真学报,2005,17(1):138-143. 被引量:60
  • 2Robert Dale, Herman Moisl, Harold Somers. Hand- book Of Natural Language Processing [M]. New York: Marcel Dekker, Inc. 2000.
  • 3David D. Palmer. A trainable rule-based algorithm for word segmentation[C]//Proceedings of the 35th annual meeting of the association for computational linguistics, a21- a28.
  • 4孙茂松,左正平,黄昌宁.汉语自动分词词典机制的实验研究[J].中文信息学报,2000,14(1):1-6. 被引量:118
  • 5吴胜远.一种汉语分词方法[J].计算机研究与发展,1996,33(4):306-311. 被引量:49
  • 6Richard Sproat, Chilin Shih, Willian Gale, et al. A stochastic Finite State word segmentation algorithm for Chinese[J]. Computing Linguist, 1996, (3) : 377-404.
  • 7李家福,张亚非.一种基于概率模型的分词系统[J].系统仿真学报,2002,14(5):544-546. 被引量:16
  • 8Dai Yubin, Loh Teeck Ee, Khoo Christopher. A new statistical formula for Chinese text segmentation incorporating contextual information [C]//Proceedings of the 22^nd annual international ACM SIGIR conference on research and development in information retrieval, pp. 88-89,1999.
  • 9Utiyama Masao, Isahara Hitoshi. A statistical Model for domain independent text segmentation[C]//The annual meeting of the association for computational linguistics and 10^th conference of the European chapter of the association for computational linguistics, pp. 491- 498, 2001.
  • 10赵铁军,吕雅娟,于浩,杨沐昀,刘芳.提高汉语自动分词精度的多步处理策略[J].中文信息学报,2001,15(1):13-18. 被引量:30

二级参考文献56

共引文献246

同被引文献105

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部