期刊文献+

基于规则、串频统计和上下文关系的现代汉语分词系统的实现 被引量:2

Implementation of a Modern Chinese Character Segmentation System Base On Rule,String Frequency Statistics and Context Analysis
下载PDF
导出
摘要 介绍了一种集合了规则、串频统计和中文上下文关系分析的现代汉语分词系统.系统对原文进行三次扫描,首先将原文读入内存,利用规则将原文变成若干个串,构成语段十字链表;然后对每个串中的子串在上下文中重复出现的次数进行统计,把根据统计结果分析出的最有可能是词的子串作为临时词;最后利用中文语法的上下文关系并结合词典对原文进行分词处理.系统对未登录词的分词有很好的效果. A modern Chinese character segmentation system based on rule, statistics and context analysis is described. The system scans the article 3 times. At the first time,it reads the article into memory and then divides it into phases and makes it into intercrossing link by using rules. At the second time,it counts the times that the strings appear. At the last time,with the help of large amount of statistical data and the grammar of the Chinese,it segments Chinese character. It is shown that the system has good performance on the unregistered words.
出处 《内蒙古师范大学学报(自然科学汉文版)》 CAS 2008年第1期71-74,共4页 Journal of Inner Mongolia Normal University(Natural Science Edition)
基金 四川省教育厅重点科研基金资助项目(2003A105) 云南省计算机技术应用重点实验室开放基金资助项目
关键词 中文分词 未登录词 现代汉语自动分词系统 Chinese segmentation unknown word modern Chinese character segmentation system
  • 相关文献

参考文献4

二级参考文献10

共引文献68

同被引文献16

  • 1李宏乔,樊孝忠.汉语文本中特殊符号串的自动识别技术[J].计算机工程,2004,30(12):114-115. 被引量:2
  • 2孙宏林,俞士汶.浅层句法分析方法概述[J].当代语言学,2000,2(2):74-83. 被引量:38
  • 3梁颖红,赵铁军,刘博,杨沐昀.基于关联度评价的中心词扩展的英文文本语块识别[J].计算机研究与发展,2006,43(1):153-158. 被引量:3
  • 4陈永府,杨小献,黄正东,陈立平.基于规则的数据收集研究[J].计算机工程与设计,2007,28(1):158-161. 被引量:4
  • 5梁颖红,赵铁军,于浩,姚健民,徐冰.基于改进K-均值聚类的汉语语块识别[J].哈尔滨工业大学学报,2007,39(7):1106-1109. 被引量:4
  • 6ABNEY S P. Parsing by chunks [ M]//BERWICK R, ABNEY S, TENNY C, et al. Principle-based parsing. Dordercht: Kluwer Academic Publishers, 1991:257- 278.
  • 7SANG E F T K, BUCHHOLZ S. Introduction to the CoNLL-2000 shared task: chunking[ C]//Proc of the 2nd Workshop on Learning Language in Logic. Morristown: Association for Computational Linguistics, 2000 : 127-132.
  • 8ARGAMON S, DAGAN I, KRYMOLOWSKI Y. A memory-based approach to learning shallow natural language patterns [ C ]//Proc of the 36th Annua Meeting of the Association for Computational Linguistics. Morristown: Association for Computational Linguistics, 1998: 67-73.
  • 9ZHANG Tong, DAMERAU F, JOHNSON D. Text chunking based on a generalization of winnow[ J ]. Journal of Machine Learning Research, 2002,2:615-637.
  • 10ABNEY S. Partial parsing via finite-state cascades[ J]. Natural Language Engineering, 1996,2(4):337-344.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部