This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been di...Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.展开更多
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
文摘Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.