期刊文献+

一个改进的汉语词性标注系统 被引量:7

An Impoved Part-of-Speech (POS) Tagging System
下载PDF
导出
摘要 汉语词性标注的难点在于确定具有多个词类的词 (兼类词 )在上下文中的词性 .基于兼类词在词典中仅占很小的比例 (约为 3% ) ,提出了具有双重状态的隐马尔可夫模型 ,它不但有一个常规的状态转移概率矩阵 ,还在逻辑上为每个具有多个词类的词保留一个专有的状态转移概率矩阵 ,使模型从一个状态转移到另一个状态的概率不再和观察无关 。 The key problem of Part of Speech (POS) tagging is to identify the POS of the words that have multiple categories in the context. Since multiple categories words only take up a small portion in dictionary, this paper presented a bi states hidden Markov model, which not only has a regular state transfer probability matrix, but also maintains a state transfer matrix for each multiple category words. The state transfer matrix is no longer context free, which improves the accuracy of the model.
作者 屈刚 陆汝占
出处 《上海交通大学学报》 EI CAS CSCD 北大核心 2003年第6期897-900,共4页 Journal of Shanghai Jiaotong University
关键词 词性标注 隐马尔可夫模型 自然语言处理 part of speech(POS) tagging hidden Markov model natural language processing(NLP)
  • 相关文献

参考文献6

  • 1Brown R F, Della-Pietray V J,de Sousa P V,et al.Class-based N-gram models onatural language [J].Computational Linguistics, 1992,18 (4) : 467 - 479.
  • 2Jelinek F. Self-organizing language models for speech recognition [A]. Reading in Speech Reognition [C]. USA: Morgan Kaufman Publishers, Inc,1990. 450-506.
  • 3Morialdo B. Tagging english text with a problistic model [J]. Computational Linguistics, 1994. 20 (2) :155-171.
  • 4Berger A L,Della P, Pietra S A, et al. A maximum entropy approach to natrual language processing [J].Computational Linguistics, 1996,22 ( 1 ) : 450- 480.
  • 5Kuhn R, Mori R. A cache-based natural language model for speech recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990,PAMI- 12(6) :570-583.
  • 6Rosenfeld R. Adaptive statistical language modeling: maximum entropy approch [D]. Pittsburgh:Carnegie Mellon Univ, 1994.

同被引文献42

引证文献7

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部