Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review 被引量：2

Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review

下载PDF

导出

摘要 In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement. In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.

作者 Ernest Yeboah Boateng Joseph Otoo Daniel A. Abaye Ernest Yeboah Boateng;Joseph Otoo;Daniel A. Abaye(Department of Basic Sciences, School of Basic and Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana;Department of Statistics and Actuarial Science, University of Ghana, Accra, Ghana)

机构地区 Department of Basic Sciences Department of Statistics and Actuarial Science

出处《Journal of Data Analysis and Information Processing》 2020年第4期341-357,共17页 数据分析和信息处理（英文）

关键词 Classification Algorithms NON-PARAMETRIC K-Nearest-Neighbor Neural Networks Random Forest Support Vector Machines Classification Algorithms Non-Parametric K-Nearest-Neighbor Neural Networks Random Forest Support Vector Machines

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

同被引文献15

1金,王晓娟,高雷,董慧琴,李博.上海市凤眼莲种群的时空分布及控制对策[J].生态学杂志,2005,24(12):1454-1458. 被引量：13
2黄本胜,徐红辉.水葫芦灾害及其水生态修复功能[J].广东水利水电,2008(3):1-3. 被引量：9
3闭小梅,闭瑞华.KNN算法综述[J].科技创新导报,2009,6(14):31-31. 被引量：36
4赵凤勤,王继权.兴化市河道治理的调查与思考[J].水利建设与管理,2014,34(9):49-51. 被引量：3
5陈旭,郝震寰.哨兵卫星Sentinel-2A数据特性及应用潜力分析[J].科技视界,2018(16):48-50. 被引量：19
6窦小凡.KNN算法综述[J].通讯世界,2018,25(10):273-274. 被引量：27
7胡盛寿,高润霖,刘力生,朱曼璐,王文,王拥军,吴兆苏,李惠君,顾东风,杨跃进,郑哲,陈伟伟,代表中国心血管病报告编写组.《中国心血管病报告2018》概要[J].中国循环杂志,2019,34(3):209-220. 被引量：3405
8蒋明,郭云开,朱佳明,刘海洋.时序遥感影像滇池凤眼莲时空动态变化分析[J].遥感信息,2019,34(3):43-47. 被引量：9
9黄飞腾,郝红光,陈维娜,孙佳艺,史文韬,张璐野,王子夫.基于动态特征的电子签名笔迹分类识别研究[J].现代计算机,2020,26(7):84-88. 被引量：5
10陈彬嫣,唐德玉.ded-kNN算法在甲状腺疾病预测中的研究[J].计算机时代,2020(7):80-82. 被引量：2

引证文献2

1谭静仪,蔡灿,林爱华,董雪.基于KNN算法的冠心病分类预测[J].现代养生,2022,22(12):1026-1028.
2张俊杰,王冬梅,石一凡,梁文广,吴勇锋,王轶虹,夏卫中.兴化市河网水葫芦信息时空动态变化研究[J].测绘通报,2024(10):13-17.

1陈霄,居荣.基于KNN算法的配电网单相接地故障选线研究[J].南京师范大学学报（工程技术版）,2020,20(3):27-31. 被引量：2
2Vishnu Charan Suresh Kumar,Prateek Suresh Harne,Samiran Mukherjee,Kashvi Gupta,Umair Masood,Anuj Vikrant Sharma,Jivan Lamichhane,Amit Singh Dhamoon,Bishnu Sapkota.Transaminitis is an indicator of mortality in patients with COVID-19:A retrospective cohort study[J].World Journal of Hepatology,2020,12(9):619-627.
3Miloud Oubadi,Ahmed Hamou,Fantina Tidim.Quantification and Qualification Analysis of the Heat Waves Using Heat Wave Norm in the Region of Bechar (Algeria) during the Period 1951-2010[J].Atmospheric and Climate Sciences,2020,10(3):273-279.
4李顺勇,张钰嘉,彭晓庆,曹付元,刘恩乾.一种基于分层抽样的大数据快速聚类算法[J].计算机应用与软件,2020,37(10):256-261. 被引量：5
5Hui Liu,Zhihao Long,Zhu Duan,Huipeng Shi.A New Model Using Multiple Feature Clustering and Neural Networks for Forecasting Hourly PM2.5 Concentrations,and Its Applications in China[J].Engineering,2020,6(8):944-956. 被引量：4
6Xiao-bang Liu,Yan-xiang Li,Hua-wei Zhang,Yuan Liu,Xiang Chen.Pore structure analysis of directionally solidified porous copper[J].China Foundry,2020,17(5):325-331. 被引量：1
7Shaomin Yan,Guang Wu.Prediction of Mutations in H7 Hemagglutinins from Influenza A Virus[J].Journal of Biomedical Science and Engineering,2020,13(8):175-186.
8Qing-Feng Zhang,Zhi-Cong Lu,Xiao-Mei Zhou,Yang Zheng,Zhan Li,Qing-Yu Peng,Shun Long,Wei-Heng Zhu.Automatic removal of false image stars in disk-resolved images of the Cassini Imaging Science Subsystem[J].Research in Astronomy and Astrophysics,2020,20(7):93-102. 被引量：1
9金宏星.A Brief Review of Root,Stem and Base and Their Application in the Text⁃book of New Senior English for China[J].海外英语,2020(20):266-267.
10BAO Yunxia,LU Faming,WANG Yanxiao,ZENG Qingtian,LIU Cong.Student Performance Prediction Based on Behavior Process Similarity[J].Chinese Journal of Electronics,2020,29(6):1110-1118. 被引量：4

Journal of Data Analysis and Information Processing

2020年第4期

浏览历史

内容加载中请稍等...

Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review 被引量：2

同被引文献15

引证文献2

相关作者

相关机构

相关主题

浏览历史