期刊文献+

傣文自动分词系统的设计与实现 被引量:2

Daiwen Word Segmentation System Design and Implementation
下载PDF
导出
摘要 傣文自动分词是傣文信息处理中的基础工作,是后续进行傣文输入法开发、傣文自动机器翻译系统开发、傣文文本信息抽取等傣文信息处理的基础,受限于傣语语料库技术,傣文自然语言处理技术较为薄弱。本文首先对傣文特点进行了分析,并在此基础上构建了傣文语料库,同时将中文分词方法应用到傣文中,结合傣文自身的特点,设计了一个基于音节序列标注的傣文分词系统,经过实验,该分词系统达到了95.58%的综合评价值。 Daiwen word segmentation is the basis for Daiwen information processing work. It's the basic work for Daiwen input method, Daiwen machine translation system development, daiwen text information extraction and oth- er information processing words. Limited by Daiwen corpus technology, Daiwen natural language processing tech- nology is relatively weak. This paper first analyzes the characteristics of Daiwen, and on this basis, build a Daiwen corpus, then, applied Chinese word segmentation method to Daiwen segmentation, combined with its own charac- teristics, Designed an Daiwen word segmentation system based on the sequence annotation. Through experiments, the segmentation system has reached a comprehensive appraisal 95.58%.
出处 《中文信息学报》 CSCD 北大核心 2013年第6期187-191,共5页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(61273288 61233009 61203258 61305003 61332017 61375027) 中国-新加坡数字媒体研究院基金(CSIDM)资助项目
关键词 傣文 分词 CRF 绝对切分词 Daiwen segmentation CRF absolute segmentation word
  • 相关文献

参考文献4

  • 1Nianwen Xue, Libin Shen. Chinese Word Segmenta- tion as LMR Tagging[C]//Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, in conjunction with ACL'03,2003: 176-179.
  • 2梁南元.书面汉语自动分词系统—CDWS[J].中文信息学报,1987,(2):44-52.
  • 3孙茂松,肖明,邹嘉彦.基于无指导学习策略的无词表条件下的汉语自动分词[J].计算机学报,2004,27(6):736-742. 被引量:37
  • 4戴红亮.傣汉《民族区域自治法》词语统计及比较分析[J].构建多语和谐的社会语言生活,民族出版社,2009:589-597.

二级参考文献7

  • 1黄萱菁,吴立德,王文欣,叶丹瑾.基于机器学习的无需人工编制词典的切词系统[J].模式识别与人工智能,1996,9(4):297-303. 被引量:24
  • 2孙茂松,黄昌宁,邹嘉彦,陆方,沈达阳.利用汉字二元语法关系解决汉语自动分词中的交集型歧义[J].计算机研究与发展,1997,34(5):332-339. 被引量:66
  • 3Sproat R., Shih C.L.. A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 1993, 4(4): 336~249
  • 4Sun Mao-Song, Shen Da-Yang, Tsou B K. Chinese word segmentation without using lexicon and hand-crafted training data. In: Proceedings of the 36th Annual Meeting of Association of Computational Linguistics and the 17th International Conference on Computational Linguistics, Montreal, Canada, 1998, 1265~1271
  • 5Nie J.Y., Jin W.Y.. A hybrid approach to unknown word detection and segmentation of Chinese. In: Proceedings of International Conference on Chinese Computing, Singapore, 1994, 405~412
  • 6Church K.W., Gale W., Hanks P., Hindle D.. Using statistics in lexical analysis. In: Zernik U. ed.. Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon. Hillsdale NJ: Lawrence Erlbaum Associates, 1991, 115~164
  • 7刘挺,吴岩,王开铸.串频统计和词形匹配相结合的汉语自动分词系统[J].中文信息学报,1998,12(1):17-25. 被引量:65

共引文献79

同被引文献18

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部