期刊文献+

多级索引的藏语分词词典设计 被引量:6

Design of Tibetan word segmentation dictionary with multi-level index
下载PDF
导出
摘要 藏语分词词典是藏语自动分词系统的重要基础,词典规模大小和算法设计的优劣直接影响着分词的效率。本项目首先收集了多部藏语字、词典的所有词条及藏语标点符号,形成了约10万词条的大型藏语分词词库;根据藏字不同长度的特点,建立了藏语特有的多级索引分词词典机制,分析设计藏语整词二分法进行藏语分词。实验结果表明该藏语分词词典具有结构简单,分词速度快和查询性能高等优点。 Tibetan word segmentation dictionary is the vital basis of the system of Tibetan automatic word segmentation, with the scale of the dictionary and the arithmetic design directly related to the efficiency of the word segmentation. This project firstly collected all the Tibetan vocabulary entries and punctuations from many dictionaries, and form an enormous Tibetan word storeroom with about 100 000 vocabularies. Secondly, a unique Tibetan multi-level index word segmentation mechanism had been founded to analyze and design Tibetan who/e-word dichotomy for Tibetan word segmentation according to the characteristic of Tibetan words with different length. The experimental results indicate that the Tibetan word segmentation dictionary has the advantages of simple structure, quick word segmentation, high inquiry capability, etc.
出处 《计算机应用》 CSCD 北大核心 2009年第B06期178-180,共3页 journal of Computer Applications
基金 中国科学院自动化研究所模式识别国家重点实验室开放课题 国家863计划项目(AA2006010101)
关键词 藏语分词 分词词典 藏语整词二分法 多级索引 Tibetan word segmentation, word segmentation dictionary, Tibetan whole-word dichotomy, multi-level index
  • 相关文献

参考文献6

二级参考文献18

  • 1陈小荷.自动分词中未登录词问题的一揽子解决方案[J].语言文字应用,1999(3):103-109. 被引量:26
  • 2孙茂松,邹嘉彦.汉语自动分词研究中的苦干理论问题[J].语言文字应用,1995(4):40-46. 被引量:45
  • 3才旦夏茸.藏文文法详解[M].西宁:青海民族出版社,1988..
  • 4马晏.基于评价的汉语自动分词系统的研究与实现[A]..语言信息处理专论[C].北京:清华大学出版社,1996..
  • 5[6]Segal M,Korobkin,R Van W klcnfeh et al Fast Shadow and Lithting Effects is Using Texture Mapping[C],USA;Proceedings of SIGGRAPH92,1992,249-252.
  • 6[7]S Seitz,C Dyer Photorealistic Scene Reconstruction by Voxel Coloring[C],CVPR,1997,1067~1073.
  • 7Choi A, Cheng C H, Ko Y L. Word extraction from Chinese documents by occurrence counts [ A].1988 International Conference on Computer Processing of Chinese and Oriental Languages, Toronto,Canada: 488 - 491.
  • 8Fan C K, Tsai W H. Automatic word identification in Chinese sentences by the relaxation technique[J]. Computer Processing of Chinese and Oriental Languages, 1988, 4(1):33-56.
  • 9梁南元.书面汉语自动分词系统—CDWS[J].中文信息学报,1987,(2):44-52.
  • 10Aoe J.An efficient digital search algorithm by using a double-array structure[J].IEEE Transactions on Software Engineering,1989(9).

共引文献235

同被引文献66

引证文献6

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部