期刊文献+

一种基于大规模语料的新词识别方法 被引量:24

Method of new word identification based on lager-scale corpus
下载PDF
导出
摘要 提出了一种基于大规模语料的新词识别方法,在重复串统计的基础上,结合分析不同串的外部环境和内部构成,依次判断上下文邻接种类,首尾单字位置成词概率以及双字耦合度等语言特征,分别过滤得到新词。通过在不同规模的语料上实验发现,此方法可行有效,能够应用到词典编撰,术语提取等领域。 The paper proposes a method for new word identification based on large scale corpus,which analyzes the outer lingual environment and inner structure of a string simultaneously.At first,find all the repetitive strings in the text collection,then decide whether a string should be filtrated or not,according to the context varieties,inside word probabilities and double character couplings.At last the remnant words are considered as new words.The experiments have done on corpus with different scale,and the results show that this method is practicable
出处 《计算机工程与应用》 CSCD 北大核心 2007年第21期157-159,共3页 Computer Engineering and Applications
基金 国家重点基础研究发展规划(973)(the National Grand Fundamental Research 973 Program of China under Grant No.2004CB318109) 中科院知识创新工程基金(No.20056550)。
关键词 新词 邻接类别 单字成词概率 双字耦合度 new words context variety inside word probability double character coupling
  • 相关文献

参考文献4

  • 1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 2崔世起,刘群,孟遥,于浩,西野文人.基于大规模语料库的新词检测[J].计算机研究与发展,2006,43(5):927-932. 被引量:32
  • 3Chen KehJiann,Bai MingHong.Unknown word detection for chinese by a coupus-based learning method[J].International Journal of Computational linguistics and Chinese Language Processing,1998,3(1):27-44.
  • 4Li Hongqiao,Huang Changning,Gao Jianfeng,et al.The use of SVM for Chinese new word identification[C]//Proceedings of First International Joint Conference on Natural Language Processing,Sanya,Hainana Island China,2004:497-504.

二级参考文献12

  • 1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 2Hua- Ping ZHANG, Qun LIU. et al, Chinese Name Entity Recognition Using Role Model[ J]. Special issue ''Word Formation and Chinese Language processing'' of the International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(2):2
  • 3Craig G. Nevill - Manning, Ian H. Witten. Identifying Hierarchical Structure in Sequences: A linear - time algorithm [J]. Journal of Artificial Intelligence Research, 1997, 7:67- 82
  • 4K.J.Chen,Ming-Hong Bai.Unknown word detection for Chinese by a corpus-based learning method.International Journal of Computational Linguistics and Chinese Language Processing,1998,3 (1):27~44
  • 5K.J.Chen,W.Y.Ma.Unknown word extraction for Chinese documents.The 19th COLING 2002,Taipei,2002
  • 6Jianfeng Gao,Mu Li,Andi Wu,et al.Chinese word segmentation:A pragmatic approach.Microsoft Research,Technical Report:MSR-TR-2004-123,2004
  • 7Nie Jian-Yun,Wanying Jin,Mareie-Louise Hannan.A hybrid approach to unknown word detection and segmentation of Chinese.Int' 1 Conf.Chinese Computing,Singapore,1994
  • 8Hua-Ping Zhang,Qun Liu,Hao Zhang,et al.Automatic recognition of Chinese unknown words based on roles tagging.The 1st SIGHAN Workshop on Chinese Language Processing,Taipei,2002
  • 9Andi Wu,Zixin Jiang.Statistically-enhanced new word identification in a rule-based Chinese system.The 2nd Chinese Language Processing Workshop,Hong Kong,2000
  • 10Fuchun Peng,Fangfang Feng,Andrew McCallum.Chinese segmentation and new word detection using conditional random fields.COLING 2004,Geneva,Switzerland,2004

共引文献77

同被引文献196

引证文献24

二级引证文献175

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部