摘要
特征选择问题是机器学习和模式识别中的一个重要问题.然而,在实际应用中,由于没有将特征选择与特征提取过程统一考虑,只注重特征本身的分类性能,没有考虑特征提取的费用问题,导致识别系统的效率较低.文中从实际应用角度,提出一种新的特征选择准则,将特征的分类性能与特征的提取费用统一考虑,利用信息增益与特征提取费用综合评价函数作为特征选择准则,并给出了启发式算法ECFS.将该算法应用于实际领域的学习问题并与决策树算法ID3和BP神经网络进行了比较.实验结果表明,ECFS在保证识别精度的同时,大大减少了特征提取的时间消耗,提高了识别速度.
Feature selection is an important problem in the fields of machine learning and pattern recognition. However, in real world domains, the fact that feature selection and feature extraction are not considered together in existing heuristic algorithms leads to the lower efficiency of application system. In this paper, a new feature selection criterion is presented which considers feature selection and feature extraction together. A heuristic algorithm based on information gain and cost of feature extraction evaluation function, ECFS is also given. It is applied to the learning problem in real world domain and is compared with ID3 and BP algorithms. The experimental results show that under the condition of ensuring the recognition rate, ECFS can reduce a lot of cost of feature extraction and improve recognition speed greatly.
出处
《计算机研究与发展》
EI
CSCD
北大核心
1999年第7期788-793,共6页
Journal of Computer Research and Development
基金
国家自然科学基金
哈工大校管基金