摘要
在人物关系抽取中,其特征空间维度往往非常高,会造成向量稀疏问题,从而影响关系抽取的效率。针对这一现象,首先将人物关系分为6类;然后引入了文档频率、信息增益、互信息和χ2统计这四种文本文类的特征选择算法,对特征空间进行降维。最后运用SVM分类器抽取人物的实体关系。实验结果表明这四种特征选择算法不仅能够保证抽取性能,还能有效的降低向量空间维数,极大提高关系抽取效率。其中,χ2统计算法效果最佳,信息增益次之。
In the people relation extraction,the spatial dimension of feature is often very high.resulting in sparse vector problem,which will affect the relationship extraction efficiency.In response to this phenomenon,the first,character relationships are divided into six categories,and then Introduced document frequency,information gain,mutual information and χ^2statistics of these four feature selection algorithm to educe the dimension of the feature space.Finally,the use of SVM classifier to extract the people entity relationship.Experimental results show that the four feature selection algorithm not only can guarantee extraction performance,but also effectively reduce the vector space dimension drops and dramatically improve the relation extraction efficiency.Which,χ^2statistical algorithm works best,followed by information gain.
出处
《科学技术与工程》
北大核心
2015年第3期254-259,共6页
Science Technology and Engineering
基金
国家自然科学基金项目(61363072)
教育部人文社科基金(11YJC740157)
江西省自然科学基金(20114BAB201027)资助
关键词
关系抽取
SVM
特征选择
多分类
relation extraction SVM feature selection multi-classification