摘要
基于模式的知识获取方法研究是当前文本知识获取的重点研究之一,如何获得文本知识模式是该研究中的一个重要研究内容。提出一种新的基于介词和动词模式(称为PV模式)的获取方法。首先构造出一个候选的动词介词组合(称为PV组合),使用统计方法对其进行过滤。度量PV组合好坏有两个标准:一个是模式词的表示能力,另一个是模式词与概念词之间及多个概念词之间的相关性。依据这两个标准构造了6个数值特征,通过训练产生了3个分类器,采用交叉验证的方式估计出3个分类器的精度分别达到0.853,0.862和0.856。这些分类器为从PV组合中自动挑选PV模式提供依据。
Pattern-based knowledge acquisition is an important research area in the research of knowledge acquisition from text (KAT). One topic of this research is how to harvest textual knowledge patterns. A novel method on acquisition of preposition-verb patterns (PV Patterns) was proposed. First, candidate preposition-verb pairs (PV pairs) were generated, and filtered by a combination of a rule-based method and statistical methods. Designed two criteria to evaluate PV patterns:coverage on instances of semantic relations and relevance among the concept words and pattern words, which lead us to construct six numeric features for PV patterns. Three classifiers were trained using these six features. The precision rates,which are estimated via cross-validation,of three classifiers are up to 0. 853,0. 862 and 0. 856, respectively. These classifiers provide a solid basis for automatically selecting PV patterns from PV pairs.
出处
《计算机科学》
CSCD
北大核心
2008年第11期139-143,共5页
Computer Science
基金
国家自然基金(60496326
60573063
60573064和60773059)
863课题(2007AA01Z325)的资助
关键词
文本知识获取
文本模式获取
模式分类
Knowledge acquisition from tex,Text pattern acquisition,Pattern classification