摘要
越来越多的实践证明,词汇知识将是未来自然语言处理系统中不可或缺的组成部分。利用机器可读词典作为资源,首先通过对释义项进行分类,然后基于释义分析自动生成用于抽取词汇知识的模板,然后采用模板匹配的方法,实现词汇知识的自动抽取。通过一种基于最大熵模型的有监督的机器学习方法,对结果进行过滤。在应用到《应用汉语词典》中后,取得了良好的抽取效果。
It has been proved by more and more practices that lexical information will be an indispensable part for natural language processing system in the future. This article introduces a method to realize the automatic extraction for lexical knowledge with the machine readable dictionary as the resource. Firstly to divide the words into groups according to their definition, then to set automatically the patterns of extraction for lexical knowledge based on the definition analysis, at last to realize the extraction by matching the patterns. The result was filtered by a supervised machine learning method based on the maximum entropy model, The method was tested on "Applied Chinese Dictionary" and turned out good extraction outcomes.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第6期8-10,共3页
Computer Applications and Software
基金
国家自然科学基金重大项目"非规范知识的基本理论和核心技术"(60496326)的支持
关键词
词汇知识
机器可读词典
模板抽取
最大熵
Lexical knowledge Machine readable dictionary Pattern extraction Maximum Entropy