期刊文献+

基于统计方法的Web新词分词方法研究 被引量:2

Study on New Words of Web Based on Statistical Word Segmentation
下载PDF
导出
摘要 本文对信息处理技术中各种分词方法进行了研究,针对目前分词方法无法识别网络中不断出现的新词,设计了一种新的基于统计的分词方法。该方法避开现有的分词方法中的复杂语法规则,无需词典的支持,很好地解决了新词不断出现的问题,而且分词速度快,具有重要的理论和实用价值。 This paper analyzes the various segmentation methods in the information processing technology. In view of the current segmentation methods in the network which do not recognize the new emerging words, we design a new sub-word method based on statistics. This method avoids complex grammar and rules, needs no enormous support from dictionaries, and resolves the problems brought by the new words. So we conclude that this method has better exactness and is very pragmatic and powerful in practical operations.
作者 张敏 王春红
出处 《计算机工程与科学》 CSCD 北大核心 2010年第5期133-135,共3页 Computer Engineering & Science
基金 山西省高等学校科技开发项目(20091150) 运城学院项目(JC-2009009)
关键词 WEB 统计分词 词典 特征提取 web statistical word segmentation dictionary feature selection
  • 相关文献

参考文献4

二级参考文献24

  • 1高永伟.近20年英语国家对新词的研究[J].外语与外语教学,1998(11):9-11. 被引量:16
  • 2邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 3.中国新闻社[EB/OL].http://www.chinanews.com.cn/,2003-10-01.
  • 4郑家恒 杜永萍 宋礼鹏.农业病虫害词汇获取方法初探[A]..第七届全国计算语言学联合学术会议论文集(JSCL-2003)[C].北京:清华大学出版社,2003..
  • 5K.J.Chen,Ming-Hong Bai.Unknown word detection for Chinese by a corpus-based learning method.International Journal of Computational Linguistics and Chinese Language Processing,1998,3 (1):27~44
  • 6K.J.Chen,W.Y.Ma.Unknown word extraction for Chinese documents.The 19th COLING 2002,Taipei,2002
  • 7Jianfeng Gao,Mu Li,Andi Wu,et al.Chinese word segmentation:A pragmatic approach.Microsoft Research,Technical Report:MSR-TR-2004-123,2004
  • 8Nie Jian-Yun,Wanying Jin,Mareie-Louise Hannan.A hybrid approach to unknown word detection and segmentation of Chinese.Int' 1 Conf.Chinese Computing,Singapore,1994
  • 9Hua-Ping Zhang,Qun Liu,Hao Zhang,et al.Automatic recognition of Chinese unknown words based on roles tagging.The 1st SIGHAN Workshop on Chinese Language Processing,Taipei,2002
  • 10Andi Wu,Zixin Jiang.Statistically-enhanced new word identification in a rule-based Chinese system.The 2nd Chinese Language Processing Workshop,Hong Kong,2000

共引文献108

同被引文献38

引证文献2

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部