The emergence of digital networks and the wide adoption of information on internet platforms have given rise to threats against users’private information.Many intruders actively seek such private data either for sale...The emergence of digital networks and the wide adoption of information on internet platforms have given rise to threats against users’private information.Many intruders actively seek such private data either for sale or other inappropriate purposes.Similarly,national and international organizations have country-level and company-level private information that could be accessed by different network attacks.Therefore,the need for a Network Intruder Detection System(NIDS)becomes essential for protecting these networks and organizations.In the evolution of NIDS,Artificial Intelligence(AI)assisted tools and methods have been widely adopted to provide effective solutions.However,the development of NIDS still faces challenges at the dataset and machine learning levels,such as large deviations in numeric features,the presence of numerous irrelevant categorical features resulting in reduced cardinality,and class imbalance in multiclass-level data.To address these challenges and offer a unified solution to NIDS development,this study proposes a novel framework that preprocesses datasets and applies a box-cox transformation to linearly transform the numeric features and bring them into closer alignment.Cardinality reduction was applied to categorical features through the binning method.Subsequently,the class imbalance dataset was addressed using the adaptive synthetic sampling data generation method.Finally,the preprocessed,refined,and oversampled feature set was divided into training and test sets with an 80–20 ratio,and two experiments were conducted.In Experiment 1,the binary classification was executed using four machine learning classifiers,with the extra trees classifier achieving the highest accuracy of 97.23%and an AUC of 0.9961.In Experiment 2,multiclass classification was performed,and the extra trees classifier emerged as the most effective,achieving an accuracy of 81.27%and an AUC of 0.97.The results were evaluated based on training,testing,and total time,and a comparative analysis with state-of-the-art studies proved the robustness and significance of the applied methods in developing a timely and precision-efficient solution to NIDS.展开更多
Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). ...Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). The collected EEG signals are processed using Machine Learning-Random Forest and Naive Bayes- and Deep Learning-Recurrent Neural Network (RNN), Neural Network (NN) and Long Short Term Memory (LSTM)-Algorithms to obtain the recent mood of a person. The Algorithms mentioned above have been imposed on the data set in order to find out what the person is feeling at a particular moment. The following thesis is conducted to find out one of the following moods (happy, surprised, disgust, fear, anger and sadness) of a person at an instant, with an aim to obtain the result with least amount of time delay as the mood differs. It is pretty obvious that the accuracy of the output varies depending upon the algorithm used, time taken to process the data, so that it is easy for us to compare the reliability and dependency of a particular algorithm to another, prior to its practical implementation. The imbalance data sets that were used had an imbalanced class and thus, over fitting occurred. This problem was handled by generating Artificial Data sets with the use of SMOTE Oversampling Technique.展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
In image-to-image translation the goal is to learn a mapping from one image domain to another.In the case of supervised approaches the mapping is learned from paired samples.However,collecting large sets of image pair...In image-to-image translation the goal is to learn a mapping from one image domain to another.In the case of supervised approaches the mapping is learned from paired samples.However,collecting large sets of image pairs is often either prohibitively expensive or not possible.As a result,in recent years more attention has been given to techniques that learn the mapping from unpaired sets.In our work,we show that injecting implicit pairs into unpaired sets strengthens the mapping between the two domains,improves the compatibility of their distributions,and leads to performance boosting of unsupervised techniques by up to 12%across several measurements.The competence of the implicit pairs is further displayed with the use of pseudo-pairs,i.e.,paired samples which only approximate a real pair.We demonstrate the effect of the approximated implicit samples on image-to-image translation problems,where such pseudo-pairs may be synthesized in one direction,but not in the other.We further show that pseudo-pairs are significantly more effective as implicit pairs in an unpaired setting,than directly using them explicitly in a paired setting.展开更多
文摘The emergence of digital networks and the wide adoption of information on internet platforms have given rise to threats against users’private information.Many intruders actively seek such private data either for sale or other inappropriate purposes.Similarly,national and international organizations have country-level and company-level private information that could be accessed by different network attacks.Therefore,the need for a Network Intruder Detection System(NIDS)becomes essential for protecting these networks and organizations.In the evolution of NIDS,Artificial Intelligence(AI)assisted tools and methods have been widely adopted to provide effective solutions.However,the development of NIDS still faces challenges at the dataset and machine learning levels,such as large deviations in numeric features,the presence of numerous irrelevant categorical features resulting in reduced cardinality,and class imbalance in multiclass-level data.To address these challenges and offer a unified solution to NIDS development,this study proposes a novel framework that preprocesses datasets and applies a box-cox transformation to linearly transform the numeric features and bring them into closer alignment.Cardinality reduction was applied to categorical features through the binning method.Subsequently,the class imbalance dataset was addressed using the adaptive synthetic sampling data generation method.Finally,the preprocessed,refined,and oversampled feature set was divided into training and test sets with an 80–20 ratio,and two experiments were conducted.In Experiment 1,the binary classification was executed using four machine learning classifiers,with the extra trees classifier achieving the highest accuracy of 97.23%and an AUC of 0.9961.In Experiment 2,multiclass classification was performed,and the extra trees classifier emerged as the most effective,achieving an accuracy of 81.27%and an AUC of 0.97.The results were evaluated based on training,testing,and total time,and a comparative analysis with state-of-the-art studies proved the robustness and significance of the applied methods in developing a timely and precision-efficient solution to NIDS.
文摘Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). The collected EEG signals are processed using Machine Learning-Random Forest and Naive Bayes- and Deep Learning-Recurrent Neural Network (RNN), Neural Network (NN) and Long Short Term Memory (LSTM)-Algorithms to obtain the recent mood of a person. The Algorithms mentioned above have been imposed on the data set in order to find out what the person is feeling at a particular moment. The following thesis is conducted to find out one of the following moods (happy, surprised, disgust, fear, anger and sadness) of a person at an instant, with an aim to obtain the result with least amount of time delay as the mood differs. It is pretty obvious that the accuracy of the output varies depending upon the algorithm used, time taken to process the data, so that it is easy for us to compare the reliability and dependency of a particular algorithm to another, prior to its practical implementation. The imbalance data sets that were used had an imbalanced class and thus, over fitting occurred. This problem was handled by generating Artificial Data sets with the use of SMOTE Oversampling Technique.
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
文摘In image-to-image translation the goal is to learn a mapping from one image domain to another.In the case of supervised approaches the mapping is learned from paired samples.However,collecting large sets of image pairs is often either prohibitively expensive or not possible.As a result,in recent years more attention has been given to techniques that learn the mapping from unpaired sets.In our work,we show that injecting implicit pairs into unpaired sets strengthens the mapping between the two domains,improves the compatibility of their distributions,and leads to performance boosting of unsupervised techniques by up to 12%across several measurements.The competence of the implicit pairs is further displayed with the use of pseudo-pairs,i.e.,paired samples which only approximate a real pair.We demonstrate the effect of the approximated implicit samples on image-to-image translation problems,where such pseudo-pairs may be synthesized in one direction,but not in the other.We further show that pseudo-pairs are significantly more effective as implicit pairs in an unpaired setting,than directly using them explicitly in a paired setting.