期刊文献+

基于词典和词频的中文分词方法 被引量:19

Chinese Word Segmentation Method Based on Dictionary and Frequency of the Words
下载PDF
导出
摘要 汉语分词是中文信息处理的前提和基础。由于中文句子中,词与词之间没有明显的分隔标志,且存在多音多义词,因此不可避免的存在切分歧义。汉语分词已成为中文信息处理的"瓶颈"。本文通过使用带有词频的词典,采用逐词分解实现了中文自动分词,并使用词频计算、歧义消除等方法提高了分词的准确率。 Chinese word segmentation is the precondition and base of the Chinese information processing.In the Chinese sentences,it has no obvious interval mark between words.It has polyphony and multi-vocal words in Chinese sentences.So,the result of segmentation unavoidable contains ambiguous words.Chinese segmentation is the "bottleneck" of the Chinese information processing.This article uses dictionary and the frequency of the word to decompose the Chinese sentence,and realizes Chinese auto-segmentation,and uses t...
出处 《微计算机信息》 北大核心 2008年第3期239-240,232,共3页 Control & Automation
基金 陕西省科技计划(2004k05-G40)
关键词 中文分词 歧义消除 词频 Chinese Word Segmentation Ambiguous word eliminating Frequency of the word
  • 相关文献

参考文献2

二级参考文献11

  • 1刘开瑛.现代汉语自动分词评测技术研究[J].语言文字应用,1997(1):103-108. 被引量:15
  • 2金春实,丁晓青,彭良瑞,刘长松.基于词素的日文分词方法及其在OCR系统中的应用[J].微计算机信息,2006(01X):244-246. 被引量:2
  • 3孙茂松 邹嘉彦 等.汉语真实文本中的交集型切歧义.汉语计量与计算研究[M].香港城市大学语言资讯科学研究中心,1998..
  • 4孙茂松,汉语计量与计算研究,1998年
  • 5刘开瑛,语言文字应用,1997年,1期
  • 6J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. [C] In Proceedings of the 18th International Conf. on Machine Learning, pages 282-289. 2001
  • 7Fuchun Peng, Fangfang Feng, and Andrew McCallum; Chinese Segmentation and New Word Detection using Conditional Random Fields. [C] In Proceedings of The 20th International Conference on Computational Linguistics (COLING 2004) , pages 562-568, August 23-27, 2004
  • 8Ng, Hwee Tou & Low, Jin Kiat. Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? [C] Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. EMNLP 2004.
  • 9N. Xue. Chinese Word Segmentation as Character Tagging. [C]International Journal of Computational Linguistics and Chinese Language Processing.2003
  • 10Collins, M. (2002). Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with the Perceptron Algorithm. [C] In Proceedings of EMNLP 2002.

共引文献64

同被引文献154

引证文献19

二级引证文献109

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部