摘要
词语库与规则库是在"多义词词义搭配知识库"中起基础与核心作用的两个子库。词语库有两个来源,一是词典词,二是真实语料词,两类词语有着书面语与口语词、正体词与异体词、语言词与言语词、通用词与领域词、稳定词与具体词等方面的差异。词语库特点会在很大程度上影响到词义标注的效果与正确率。纳入首批考察的词语为双音节多义词3771条,共有义项7861个。规则库统摄语义库、义项库、语料库,这些知识库通过规则库的组织而发挥作用。规则库是实现词义标注工程目标的直接依据,对于任何一个多义词,规则定义的多寡有无、质量好坏都会直接影响标注结果。规则库集中体现SCT整个系统的意义与价值,是语言知识与工程实施的结晶体。
The formal features of a sense refer to the expression forms and collocation environment for its connotation, which are markers of computer recognition. They are mainly reflected in three aspects: parts of speech, syntactic functions, and semantic features. This paper discusses the type, characteristics and effectiveness of formal features, investigates how to extract formal features, and explores the differences of senses in printed dictionaries from those in electronic dictionaries. Focused on the relationship between the formal features and the significance, this paper suggests one should emphasize formal features of a sense based on the different significance and avoid the tendency of form determinism.
出处
《语言文字应用》
CSSCI
北大核心
2014年第1期20-28,共9页
Applied Linguistics
关键词
义项形式特征
传统词典义项
机用义项库
formal features of a sense
sense from printed dictionary
Word sensebase for computer