摘要
由于数据具有海量、高相关性和非线性的特点,所以如何选择原始数据的本质特征,是关系到能否有效提高问题分类器推广能力的关键问题。本文讨论了目前基于所有特征以及词袋和词序列袋的特征选择方法,提出了采用随机森林和支持向量机(SVM)相结合的方法来进行特征选择。实验证明,此方法能够有效地选择分类特征,从而提升问题分类的效率和精度。
The key points to improve the generalization ability of question classifier is how to extract the essence and internal characteristics from the high scale,high correlation and nonlinear original data. The feature selection method based on all features,word bag and word sequence is discussed in this paper. A combination approach of random forest and support vector machine(SVM) is proposed for feature selection. Experiments show that this method is simple and effective in selection of classification features,and can improve the efficiency and accuracy of question classification.
出处
《辽宁科技大学学报》
CAS
2016年第2期146-152,共7页
Journal of University of Science and Technology Liaoning
关键词
支持向量机
随机森林
特征选择
support vector machine
random forest
feature selection