期刊文献+

Chinese Word Boundary Ambiguity and Unknown Word Resolution Using Unsupervised Methods 被引量:1

Chinese Word Boundary Ambiguity and Unknown Word Resolution Using Unsupervised Methods
下载PDF
导出
摘要 An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described. It first proposes a statistical segmentation model integrating the simplified character juncture model (SCJM) with word formation power. The advantage of this model is that it can employ the affinity of characters inside or outside a word and word formation power simultaneously to process disambiguation and all the parameters can be estimated in an unsupervised way. After investigating the differences between real and theoretical size of segmentation space, we apply A * algorithm to perform segmentation without exhaustively searching all the potential segmentations. Finally, an unsupervised version of Chinese word formation patterns to detect unknown words is presented. Experiments show that the proposed methods are efficient. An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described. It first proposes a statistical segmentation model integrating the simplified character juncture model (SCJM) with word formation power. The advantage of this model is that it can employ the affinity of characters inside or outside a word and word formation power simultaneously to process disambiguation and all the parameters can be estimated in an unsupervised way. After investigating the differences between real and theoretical size of segmentation space, we apply A* algorithm to perform segmentation without exhaustively searching all the potential segmentations. Finally, an unsupervised version of Chinese word-formation patterns to detect unknown words is presented. Experiments show that the proposed methods are efficient.
作者 傅国宏
出处 《High Technology Letters》 EI CAS 2000年第2期29-39,共11页 高技术通讯(英文版)
基金 SupportedbytheNationalNaturalScienceFoundationandtheHighTechnologyResearchandDevelopmentProgrammeofChina
关键词 Word SEGMENTATION CHARACTER JUNCTURE WORK formation PATTERN Algorithms Data structures Image analysis Image quality Image segmentation Mathematical models
  • 相关文献

参考文献5

  • 1Nilsson NJ.Principles of Artificial Intelligence[]..1980
  • 2Jelinek Frederick.Statistical Methods for Speech Recognition[]..1997
  • 3Sproat,Richard,Chilin Shih,William Gale,and Nancy Chang. Computational Linguistics . 1996
  • 4Gan Kokwee,Martha Palmer,Lua Kimteng. Computational Linguistics . 1996
  • 5Liang NanYuan.Written Chinese Segmentation and Chinese Word Segmentation System CDWS[]..1984

同被引文献13

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部