摘要
动词次范畴化自动获取过程主要涉及到两个典型步骤一、依据启发性规则生成次范畴化假设;二、应用统计方法对假设集合进行过滤,选择可靠的次范畴化类型。此前改进获取性能的研究都集中在统计过滤阶段,并且相关实验的假设生成阶段都没有涉及到有指导的训练过程,因此所有这些方法都是无指导的。文章提出一种弱指导的汉语动词次范畴化自动获取方案,并应用SVM分类器取代原系统中的无指导假设生成模块。实验结果表明,最终获取性能有了统计意义上的改善。
Procedure of subcategorization acquisition mainly includes two typical steps :Subcategorization hypotheses are generated according to certain heuristic rules ;Hypotheses are filtered via statistical methods and reliable subcategorization types are selected.Previous efforts to improve the acquisition performance are focused on statistical filtering,and there is no supervised training for the generation of hypotheses in relevant experiments,Therefore,all these methods are unsupervised.This paper proposes a weakly supervised method for Chinese subcategorization acquisition, where the unsupervised hypothesis generator is replaced with a SVM classifier.Results of experiments indicate statistically significant improvement in the general acquisition performance.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第28期9-11,27,共4页
Computer Engineering and Applications
基金
国家自然科学基金项目资助(编号:60373101)
关键词
汉语动词
次范畴化
弱指导
SVM
Chinese verbs, subcategorization,weakly supervised, SVM