With the increasing dimensionality of network traffic,extracting effective traffic features and improving the identification accuracy of different intrusion traffic have become critical in intrusion detection systems(...With the increasing dimensionality of network traffic,extracting effective traffic features and improving the identification accuracy of different intrusion traffic have become critical in intrusion detection systems(IDS).However,both unsupervised and semisupervised anomalous traffic detection methods suffer from the drawback of ignoring potential correlations between features,resulting in an analysis that is not an optimal set.Therefore,in order to extract more representative traffic features as well as to improve the accuracy of traffic identification,this paper proposes a feature dimensionality reduction method combining principal component analysis and Hotelling’s T^(2) and a multilayer convolutional bidirectional long short-term memory(MSC_BiLSTM)classifier model for network traffic intrusion detection.This method reduces the parameters and redundancy of the model by feature extraction and extracts the dependent features between the data by a bidirectional long short-term memory(BiLSTM)network,which fully considers the influence between the before and after features.The network traffic is first characteristically downscaled by principal component analysis(PCA),and then the downscaled principal components are used as input to Hotelling’s T^(2) to compare the differences between groups.For datasets with outliers,Hotelling’s T^(2) can help identify the groups where the outliers are located and quantitatively measure the extent of the outliers.Finally,a multilayer convolutional neural network and a BiLSTM network are used to extract the spatial and temporal features of network traffic data.The empirical consequences exhibit that the suggested approach in this manuscript attains superior outcomes in precision,recall and F1-score juxtaposed with the prevailing techniques.The results show that the intrusion detection accuracy,precision,and F1-score of the proposed MSC_BiLSTM model for the CIC-IDS 2017 dataset are 98.71%,95.97%,and 90.22%.展开更多
Detecting differential expression of genes in genom research(e.g.,2019-nCoV)is not uncommon,due to the cost only small sample is employed to estimate a large number of variances(or their inverse)of variables simultane...Detecting differential expression of genes in genom research(e.g.,2019-nCoV)is not uncommon,due to the cost only small sample is employed to estimate a large number of variances(or their inverse)of variables simultaneously.However,the commonly used approaches perform unreliable.Borrowing information across different variables or priori information of variables,shrinkage estimation approaches are proposed and some optimal shrinkage estimators are obtained in the sense of asymptotic.In this paper,we focus on the setting of small sample and a likelihood-unbiased estimator for power of variances is given under the assumption that the variances are chi-squared distribution.Simulation reports show that the likelihood-unbiased estimators for variances and their inverse perform very well.In addition,application comparison and real data analysis indicate that the proposed estimator also works well.展开更多
For several decades,much attention has been paid to the two-sample Behrens-Fisher(BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures...For several decades,much attention has been paid to the two-sample Behrens-Fisher(BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures.Little work,however,has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures.In this paper we study this challenging problem via extending the famous Scheffe's transformation method,which reduces the k-sample BF problem to a one-sample problem.The induced one-sample problem can be easily tested by the classical Hotelling's T 2 test when the size of the resulting sample is very large relative to its dimensionality.For high dimensional data,however,the dimensionality of the resulting sample is often very large,and even much larger than its sample size,which makes the classical Hotelling's T 2 test not powerful or not even well defined.To overcome this diffculty,we propose and study an L2-norm based test.The asymp-totic powers of the proposed L2-norm based test and Hotelling's T 2 test are derived and theoretically compared.Methods for implementing the L2-norm based test are described.Simulation studies are conducted to compare the L2-norm based test and Hotelling's T 2 test when the latter can be well defined,and to compare the proposed implementation methods for the L2-norm based test otherwise.The methodologies are motivated and illustrated by a real data example.展开更多
基金supported by Tianshan Talent Training Project-Xinjiang Science and Technology Innovation Team Program(2023TSYCTD).
文摘With the increasing dimensionality of network traffic,extracting effective traffic features and improving the identification accuracy of different intrusion traffic have become critical in intrusion detection systems(IDS).However,both unsupervised and semisupervised anomalous traffic detection methods suffer from the drawback of ignoring potential correlations between features,resulting in an analysis that is not an optimal set.Therefore,in order to extract more representative traffic features as well as to improve the accuracy of traffic identification,this paper proposes a feature dimensionality reduction method combining principal component analysis and Hotelling’s T^(2) and a multilayer convolutional bidirectional long short-term memory(MSC_BiLSTM)classifier model for network traffic intrusion detection.This method reduces the parameters and redundancy of the model by feature extraction and extracts the dependent features between the data by a bidirectional long short-term memory(BiLSTM)network,which fully considers the influence between the before and after features.The network traffic is first characteristically downscaled by principal component analysis(PCA),and then the downscaled principal components are used as input to Hotelling’s T^(2) to compare the differences between groups.For datasets with outliers,Hotelling’s T^(2) can help identify the groups where the outliers are located and quantitatively measure the extent of the outliers.Finally,a multilayer convolutional neural network and a BiLSTM network are used to extract the spatial and temporal features of network traffic data.The empirical consequences exhibit that the suggested approach in this manuscript attains superior outcomes in precision,recall and F1-score juxtaposed with the prevailing techniques.The results show that the intrusion detection accuracy,precision,and F1-score of the proposed MSC_BiLSTM model for the CIC-IDS 2017 dataset are 98.71%,95.97%,and 90.22%.
基金Supported by the National Natural Science Foundation of China(11971433)First Class Discipline of Zhejiang-A(Zhejiang Gongshang University-Statistics)Hunan Soft Science Research Project(2012ZK3064)
文摘Detecting differential expression of genes in genom research(e.g.,2019-nCoV)is not uncommon,due to the cost only small sample is employed to estimate a large number of variances(or their inverse)of variables simultaneously.However,the commonly used approaches perform unreliable.Borrowing information across different variables or priori information of variables,shrinkage estimation approaches are proposed and some optimal shrinkage estimators are obtained in the sense of asymptotic.In this paper,we focus on the setting of small sample and a likelihood-unbiased estimator for power of variances is given under the assumption that the variances are chi-squared distribution.Simulation reports show that the likelihood-unbiased estimators for variances and their inverse perform very well.In addition,application comparison and real data analysis indicate that the proposed estimator also works well.
基金supported by the National University of Singapore Academic Research Grant (Grant No. R-155-000-085-112)
文摘For several decades,much attention has been paid to the two-sample Behrens-Fisher(BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures.Little work,however,has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures.In this paper we study this challenging problem via extending the famous Scheffe's transformation method,which reduces the k-sample BF problem to a one-sample problem.The induced one-sample problem can be easily tested by the classical Hotelling's T 2 test when the size of the resulting sample is very large relative to its dimensionality.For high dimensional data,however,the dimensionality of the resulting sample is often very large,and even much larger than its sample size,which makes the classical Hotelling's T 2 test not powerful or not even well defined.To overcome this diffculty,we propose and study an L2-norm based test.The asymp-totic powers of the proposed L2-norm based test and Hotelling's T 2 test are derived and theoretically compared.Methods for implementing the L2-norm based test are described.Simulation studies are conducted to compare the L2-norm based test and Hotelling's T 2 test when the latter can be well defined,and to compare the proposed implementation methods for the L2-norm based test otherwise.The methodologies are motivated and illustrated by a real data example.