期刊文献+

基于最大熵模型的交集型切分歧义消解 被引量:6

Resolution of Overlapping Ambiguity Strings Based on Maximum Entropy Model
下载PDF
导出
摘要 利用最大熵模型研究中文自动分词中交集型切分歧义的消解.模型输出的类别为两种:前两个字结合成词和后两个字结合成词.模型采用的特征为:待切分歧义字段的上下文各一个词、待切分歧义字段和歧义字段两种切分可能的词概率大小关系.通过正向最大匹配(FMM)和逆向最大匹配(BMM)相结合的分词方法,发现训练文本中的交集型歧义字段并进行标注,用于最大熵模型的训练.实验用1998年1月《人民日报》中出现的交集型歧义字段进行训练和测试,封闭测试正确率98.64%,开放测试正确率95.01%,后者比常用的词概率法提高了3.76%. The resolution of overlapping ambiguity strings (OAS) is studied based on maximum entropy model. There are two model outputs, where either the first two characters form a word or the last two characters form a word. Features of the model include one word in context of OAS, the current OAS and word probability relation of two kinds of segmentations result. OAS in the training text is found by the combination of FMM and BMM segmentation method. After feature tagging they are used to train the maximum entropy model. The People Daily corpus of January 1998 is used in training and testing. Experimental result shows a closed test precision of 98.64% and an open test precision of 95.01%. The open test precision is improved 3.76% compared with that of the precision of common word probability method.
作者 张锋 樊孝忠
出处 《北京理工大学学报》 EI CAS CSCD 北大核心 2005年第7期590-593,共4页 Transactions of Beijing Institute of Technology
关键词 中文信息处理 汉语自动分词 交集型歧义 最大熵模型 Chinese information processing Chinese automatic word segmentation overlapping ambiguity strings maximum entropy model
  • 相关文献

参考文献7

  • 1孙茂松,左正平,黄昌宁.消解中文三字长交集型分词歧义的算法[J].清华大学学报(自然科学版),1999,39(5):101-103. 被引量:22
  • 2梁南元.书面汉语自动分词系统—CDWS[J].中文信息学报,1987,(2):44-52.
  • 3李蓉,刘少辉,叶世伟,史忠植.基于SVM和k-NN结合的汉语交集型歧义切分方法[J].中文信息学报,2001,15(6):13-18. 被引量:19
  • 4Ratnaprkhi A. Maximum entropy models for natural language ambiguity resolution [D]. Pomsy Lvania: University of Pennsylvania, 1998.
  • 5Berger A L, Pietra S A D, Pietra V J D. A maximum entropy approach to natural language processing [J]. Computational Linguistic, 1996,22(1): 39-71.
  • 6Darroch J N, Ratcliff D. Generalized iterative scaling for log-linear models [J]. The Annals of Mathematical Statistics, 1972,43(5): 1470-1480.
  • 7Pietra S D, Pietra V D, Lafferty J. Inducing features of random fields [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(4): 380-393.

二级参考文献7

共引文献74

同被引文献71

引证文献6

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部