期刊文献+

基于标记的规则统计模型与未登录词识别算法 被引量:13

A Rule-statistic Model Based on Tag and an Algorithm to Recognize Unknown Words
下载PDF
导出
摘要 该文针对小型词库,提出了基于规则统计模型的消歧方法和识别未登录词的词加权算法。通过大量语料库学习获取歧义高频字,作为歧义标记,利用规则统计模型对标记的上下文信息分类处理,剩下的部分进行正向或逆向动态最大匹配,对连续单字串使用词加权算法来判断其是否为未登录多字词。经过实验测试,该系统的准确率为98.88%,召回率为98.32%。 It is prerequisite to segment words for Chinese text understanding.This paper presents a novel method to it for a small vocabulary.It uses rule-statistic models to eliminate ambiguity and uses word-weight algorithm to recognize unknown words.The characters with high frequency ambiguity are extracted firstly,then the context of the extracted characters is dealed with according to the rule -statistic model.The others are segmented by a dynamic maximum matching approach.Unknown words are identified based on word-weight algorithm from a sequence of continuous single-character words.Finally,this paper further demonstrates the segmented results using the software prototype developed by authors based on the proposed approach,with a precision rate of98.88%,a recall rate of98.32%.Thus the approach is more effective and robust.
出处 《计算机工程与应用》 CSCD 北大核心 2004年第15期43-45,91,共4页 Computer Engineering and Applications
基金 国家973基础研究计划课题(编号:2002CB312103) 国家自然科学基金项目(编号:60373056) 国家自然科学基金重点项目(编号:60033020)
关键词 歧义标记 规则统计模型 N元语法 词加权算法 ambiguity tag,rule-statistic model,n-gram,word-weight algorithm
  • 相关文献

参考文献11

  • 1许嘉璐.现状和设想——试论中文信息处理与现代汉语研究[J].中国语文,2000(6):490-496. 被引量:37
  • 2Swen Bing,Yu Shiwen. A Graded Approach for the Efficient Resolution of Chinese Word Segmentation Ambiguities. NLPPRS ,Beijing, China, 1999
  • 3Jin Hu Huang,David Powers. Chinese Word Segmentation based onContextual Entropy[C].In :Pacific Asia Conference on Language,Information and Computation,2003-09
  • 4Foo S,Li H.Chinese word segmentation and its effect on information retrieval.Information Processing & Management,2002
  • 5高山 张艳 徐波.基于三元统计模型的汉语分词标注一体化研究[C]..见:全国第五届计算语言学联合学术会议(JSCL2001)[C].,2001..
  • 6孙茂松,左正平,邹嘉彦.高频最大交集型歧义切分字段在汉语自动分词中的作用[J].中文信息学报,1999,13(1):27-34. 被引量:51
  • 7刘群.汉语词法分析和句法分析技术综述[C]..见:第一届学生计算语言学研讨会(SWCL2002)专题讲座[C].,2002..
  • 8.87年字汇表构词率统计表[Z].,..
  • 9Kim-Teng Lua,Kok_Wee Gan. An Application of Information Theory in Chinese Word Segmentation[J].Computer Processing of Chinese & Oriental Languages, 1994; 8 (1): 115~124
  • 10Kim-Teng Lua. From character to word-An application of information theory[J].Computer Processing of Chinese & Oriental Languages, 1994:4(4) :304~313

二级参考文献6

共引文献93

同被引文献112

引证文献13

二级引证文献151

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部