摘要
中文信息处理中 ,判断哪些词串该入选《分词词表》一直是一个难题。互信息作为一种衡量手段 ,在一定程度上体现了词串的各组成部分之间结合的紧密程度 ,以北京大学 1998年 1月《人民日报》标注语料为试验料 ,通过互信息的计算分析四字词串成词的可能性 。
During Chinese information processing,judging which word strings should be in participle list is always a difficult problem.Mutual information is a judgement measure and it reflects the compactness of different parts of strings.This paper analyses the possibility of making four-word-string into words based on the corpus of China Daily in Jan.1998 of Beijing University and provides foundation for determining whether the strings can be in list.
出处
《电脑开发与应用》
2005年第1期2-3,6,共3页
Computer Development & Applications
基金
国家 973项目 (G19980 30 5 0 1A- 0 4 )资助