摘要
我们通过对1200万字语料的统计得出,派生词约占词条总数的8.66%,构成派生词的词缀共有188个。其中,后缀“者”所构成的派生词词条数最多,构词成分最为复杂。我们采用基本词表、词例知识规则并结合词语的搭配、共现频率的混合策略对带后缀“者”的派生词进行了自动识别,封闭测试的精确率为93.06%,开放测试的精确率为82.40%。
With the statistic of corpus of 1.2 million characters, the authors concluded that derivatives make up 8.66 % of the total word items. The amount of affixes that can be used to form derivatives is 188. Among them, the numbers of the word items consisting of the suffix zhe (者) axe the largest one, furthermore, their comprising factors axe the most complicated. Using the combined policy which including the basic lexicon, the knowledge of lexicalism, the matching of words, and the frequency of the words together appearance, the authors processed the automatic recognition of the derivatives with the suffix zhe. The accuracy rate of the closed test is 93.06%, while that of the open test is 82.40%.
出处
《语言文字应用》
CSSCI
北大核心
2006年第2期139-144,共6页
Applied Linguistics