期刊文献+

基于WWW的未登录词识别研究 被引量:7

WWW-based Recognition of Non-login Words
下载PDF
导出
摘要 Currently, very little reference material can be found on the research of non-login word recognition. Solu-tions based on rules and syntaxes can't satisfactorily solve all kinds of problems of non-login word recognition. Thispaper will study and compare several existing solutions. The proposed solution is to extract N-grams after words sep-aration, from which non-login words can be extracted by means of probability statistics. Experiments have demon-strated that this method has favorable efficiency, recall ratio, and accuracy. Currently, very little reference material can be found on the research of non-login word recognition. Solutions based on rules and syntaxes can't satisfactorily solve all kinds of problems of non-login word recognition. This paper will study and compare several existing solutions. The proposed solution is to extract N-grams after words separation, from which non-login words can be extracted by means of probability statistics. Experiments have demonstrated that this method has favorable efficiency, recall ratio, and accuracy.
出处 《计算机科学》 CSCD 北大核心 2002年第12期155-156,共2页 Computer Science
关键词 中文信息处理 中文分词处理 WWW 未登录词识别 分词词典 计算机 Non-login word. Recognition, N-gram, WWW
  • 相关文献

参考文献5

二级参考文献12

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2杨允信.中文文件自动分类之研究.台湾第六届计算语言学研讨会论文集[M].-,1993..
  • 3丁均彦.文本分类系统的研究与实现[硕士学位论文].北京:清华大学,1998..
  • 4张潮生,中文信息处理国际会议论文集,1987年
  • 5梁南元,中文信息,1986年,1期
  • 6姚天顺,计算机的汉字信息处理,1985年
  • 7管纪文,中文信息处理国际研讨论文集,1983年
  • 8丁均彦,硕士学位论文,1998年
  • 9Young S,The HTK Book,1997年
  • 10Yang Y,Proc 18th SIGIR Conf,1995年

共引文献58

同被引文献90

引证文献7

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部