Feature selection is essential for prioritising important attributes in data to improve prediction quality in machine learning algorithms.As different selection techniques identify different feature sets,relying on a ...Feature selection is essential for prioritising important attributes in data to improve prediction quality in machine learning algorithms.As different selection techniques identify different feature sets,relying on a single method may result in risky decisions.The authors propose an ensemble approach using union and quorum combination techniques with five primary individual selection methods which are analysis of variance,variance threshold,sequential backward search,recursive feature elimination,and least absolute selection and shrinkage operator.The proposed method reduces features in three rounds:(i)discard redundant features using pairwise correlation,(ii)individual methods select their own feature sets independently,and(iii)equalise individual feature sets.The equalised individual feature sets are combined using union and quorum techniques.Both the combined and individual sets are tested for network anomaly detection using random forest,decision tree,K-nearest neighbours,Gaussian Naive Bayes,and logistic regression classifiers.The experimental results on the UNSW-NB15 data set show that random forest with union and quorum feature sets yields 99 and 99.02% f1_score with minimum 6 and 12 features,respectively.The results on the NSL-KDD data set show that random forest with union and quorum gets 99.34 and 99.21% f1_score with a minimum of 28 and 18 features.展开更多
Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techn...Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines.展开更多
文摘Feature selection is essential for prioritising important attributes in data to improve prediction quality in machine learning algorithms.As different selection techniques identify different feature sets,relying on a single method may result in risky decisions.The authors propose an ensemble approach using union and quorum combination techniques with five primary individual selection methods which are analysis of variance,variance threshold,sequential backward search,recursive feature elimination,and least absolute selection and shrinkage operator.The proposed method reduces features in three rounds:(i)discard redundant features using pairwise correlation,(ii)individual methods select their own feature sets independently,and(iii)equalise individual feature sets.The equalised individual feature sets are combined using union and quorum techniques.Both the combined and individual sets are tested for network anomaly detection using random forest,decision tree,K-nearest neighbours,Gaussian Naive Bayes,and logistic regression classifiers.The experimental results on the UNSW-NB15 data set show that random forest with union and quorum feature sets yields 99 and 99.02% f1_score with minimum 6 and 12 features,respectively.The results on the NSL-KDD data set show that random forest with union and quorum gets 99.34 and 99.21% f1_score with a minimum of 28 and 18 features.
文摘Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines.