摘要
任何领域的大数据研究都离不开用机器学习方法提取特征.为了探求满足海量大数据分析需求的特征选择方法,笔者对利用机器学习进行特征选择的常用方法做了深入分析,归纳总结出特征选择的五大类方法:相关性度量方法、Lasso稀疏选择方法、集成方法、神经网络方法、主成分分析方法.通过对比不同特征选择方法的原理、实现过程以及应用场景,给出了不同算法下进行特征选择时的适用范围、优缺点和关键点,为研究者提供参考.
Big data research is widely spread around the world,and feature selection of machine learning plays an important role on these researches. To address the issue of discovering novel feature selection methods in data mining tasks on big data,this paper researches five models related to feature selection:linear coefficient correlation,Lasso sparse selection,ensemble learning models,neural networks,principal component analysis. The merits and drawbacks of these models are extensively discussed in depth in this paper,which may help in providing a direction for those who are interested in the machine learning area.
作者
崔鸿雁
徐帅
张利锋
Roy E.Welsch
Berthold K.P.Horn
CUI Hong-yan1,2,3, XU Shuai1,2,3, ZHANG Li-feng1,2,3, Roy E. Welsch4 , Berthold K. P. Horn5(1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China; 2. Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China; 3. Beijing Laboratory of Advanced Information Networks, Beijing 100876, China; 4. Sloan School of Management, Massachusetts Institute of Technology, MA 02139, USA; 5. Csail Laboratory, Massachusetts Institute of Technology, MA 02139, US)
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2018年第1期1-12,共12页
Journal of Beijing University of Posts and Telecommunications
基金
教育部-中国移动科研基金项目(MCM20170306)
关键词
机器学习
特征选择
迁移学习
对抗神经网络
人工智能
machine learning
feature selection
transfer learning
generative adversarial networks
arti-ficial intelligence