期刊文献+

自动获取汉语词语搭配 被引量:14

Automatic Acquisition of Chinese Collocation
下载PDF
导出
摘要 作为一种词汇现象,词语搭配在自然语言处理的许多领域具有重要的应用。本文对4种词语相关性度量和3种词语结构分布度量分别进行了比较分析,并提出了一种基于互信息与熵融合的获取词语搭配的方法。实验结果表明:在同现频率较高情况下,互信息、Cosine系数、x2测试和似然比测试4种相关性度量对搭配判定有大致相同的效果;在度量词语的结构分布方面,熵要优于方差和离散度。本文所提方法依赖度量指标少,阈值容易选取,且与其他已有的方法具有同等效果。 As a kind of word phenomenon, collocation plays a very important role in nature language processing. In this paper, 4 kinds of word association measurements and 3 kinds of word structure distribution measurements are compared and analyzed respectively, and a hybrid method based on mutual information and entropy for collocation is proposed. The experiment results indicate that 4 kinds of word association measurements, mutual information, Cosine coefficient, χ^2test and likelihood ratio have the same impact under high co-occurrence frequency for collocation acquiring and entropy is superior to variance and spread for measuring the word structure distribution. The proposed method relies on fewer measurements and can easily selects coefficient thresholds and achieves the same impact of the existing methods.
出处 《中文信息学报》 CSCD 北大核心 2006年第6期31-37,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金项目(60573074) 山西省青年科技基金项目(20031027) 山西省自然科学基金(20041040) 山西省科技攻关项目(051129)
关键词 计算机应用 中文信息处理 词语搭配 互信息 computer application Chinese information processing collocation mutual information entropy
  • 相关文献

参考文献15

二级参考文献38

  • 1孙茂松,黄昌宁,方捷.汉语搭配定量分析初探[J].中国语文,1997(1):29-38. 被引量:55
  • 2Nancy I de, Jean Veronis. Introduction to the Special Issue on Word Sense Disambiguation:The State of the Art[J].Computational Linguistics. 1998, 1-42.
  • 3Yarowsky D. Umupervised Word Sense Disambiguation Rivaling Supervised Methods[A]. In: Proceedings of 33rd Annual Meeting of ACL[C], Cambridge, Massachusetts, USA, 1995, 181 - 188.
  • 4HAO Trang Dang, Ching - yi Chia. Simple Features for Chinese Word Sense Disambiguation[A]. In: Proceedings of COLING-2002 [ C ].Philadelphia, USA, 2002, 88- 94.
  • 5Lesk, Michael, Automatic Sense Disambiguation: How to tell a Pine Cone from and Ice Cream Cone, Proceeding of the 1986 SIGDOC Conference, Association for Computing Machinery, New York, 1986.
  • 6M Benson,E Benson,R llson.The BBI Combinatory Dictionary of Eng lish:A Guide to Word Combinations[M].John Benjamins Publishing Compony , 1986.
  • 7车万翔.面向依存文法分析的搭配抽取方法研究:自然语言理解与 机器翻译[M].清华大学出版社,2001..
  • 8孙宏林.词语搭配在文本中的分布特征[C].见:黄吕宁主编.1998中文信息处理N际会议论文集[C].清华大学出版社,..
  • 9高惠璇.统计计算[M].北京大学出版社,1997..
  • 10Smadja F. Retrieving Collocations from Text: Xtract[J]. Computional Linguistics, 1993,19(1): 143-177

共引文献88

同被引文献119

引证文献14

二级引证文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部