摘要
针对自然语言处理系统在短语分析时的词汇排歧和结构排歧需要,本文提出了一种基于语料库的汉语短语语义搭配规则自动获取方法.该方法以《知网》为语义知识资源,在标注了句法语义信息的汉语短语熟语料库基础上,先采用数据挖掘中元规则制导的交叉层关联规则挖掘方法,自动发现汉语短语的语义搭配规律,再根据统计结果自动优选后生成语义搭配规则库.实验结果表明该方法是切实可行的.运用该方法自动获取的语义搭配规则具有较好的排歧效果.
The semantic collocations play important roles in parsing Chinese phrases. It is useful for both semantic disambiguation and structural disambiguation. In this paper,a corpus-based method was proposed to automatically acquire semantic collocation rules from a Chinese phrase corpus,which was annotated with semantic knowledge according to HowNet. Moreover,a metarule-guided algorithm for mining cross-level association rules was developed to acquire semantic collocation rules from the corpus. And an optimized algorithm was developed to filter these rules. The experiment results showed the effectiveness of the proposed method. Disambiguation performance of the automatically acquired rules was quiet well.
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2007年第3期331-336,共6页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(60373080)资助
关键词
语义规则
语料库
关联规则
知网
semantic rules
corpus
association rules
HowNet