摘要
在自然语言处理中,短语在汉语分析中占有举足轻重的地位。短语作为汉语句子中的一个基本组成单位,在整个汉语句子的句法分析与语义分析中具有特别重要的意义。为了提高汉语分析的质量,文中在借鉴他人算法的基础上,提出了一种规则和统计相结合的短语识别方法。首先利用词或词语之间的互信息进行短语边界的预测,然后根据词语的词汇和词类信息进行边界调整,最后进行括号匹配和短语标注。实验结果表明:该方法提高了短语的识别率和准确率,提高了汉语分析的质量。
In natural language processing, the phrase holds the pivotal status in Chinese analysis. The phrase is a basic composition unit of the Chinese sentence, which has especial vital significance in the entire Chinese sentence analysis and the semantic analysis. In order to enhance Chinese analysis the quality,introduced one kind of the phrase chunking methods with rule and the statistics unifies based on the other people algorithm, firstly, which carries on the phrase boundary forecasting using the mutual information between the word and the words, then carries on the boundary adjustment according to glossary and the part of speech information of the words and expressions, finally, carries on the parenthesis match and phrase sign note. The experimental result indicates that this method enhances the phrase recognition rate and accurate rate, improves Chinese analysis quality.
出处
《计算机技术与发展》
2008年第4期67-69,73,共4页
Computer Technology and Development
基金
安徽省高校自然科学研究项目(KJ2007B159)
(KJ2007B366ZC)
关键词
NLP
短语识别
互信息
NLP
phrase chunking
mutual information