Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data ...Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data collected supports the real-time decision-making required for diverse applications.The communication infrastructure relies on different network types,including the Internet.This makes the infrastructure vulnerable to various attacks,which could compromise security or have devastating effects.However,traditional machine learning solutions cannot adapt to the increasing complexity and diversity of attacks.The objective of this paper is to develop an Anomaly Detection System(ADS)based on deep learning using the CIC-IDS2017 dataset.However,this dataset is highly imbalanced;thus,a two-step sampling technique:random under-sampling and the Synthetic Minority Oversampling Technique(SMOTE),is proposed to balance the dataset.The proposed system utilizes a multiple hidden layer Auto-encoder(AE)for feature extraction and dimensional reduction.In addition,an ensemble voting based on both Random Forest(RF)and Convolu-tional Neural Network(CNN)is developed to classify the multiclass attack cate-gories.The proposed system is evaluated and compared with six different state-of-the-art machine learning and deep learning algorithms:Random Forest(RF),Light Gradient Boosting Machine(LightGBM),eXtreme Gradient Boosting(XGboost),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM),and bidirectional LSTM(biLSTM).Experimental results show that the proposed model enhances the detection for each attack class compared with the other machine learning and deep learning models with overall accuracy(98.29%),precision(99%),recall(98%),F_(1) score(98%),and the UNDetection rate(UND)(8%).展开更多
Bioactive compounds in plants,which can be synthesized using N-arylationmethods such as the Buchwald-Hartwig reaction,are essential in drug discovery for their pharmacological effects.Important descriptors are necessa...Bioactive compounds in plants,which can be synthesized using N-arylationmethods such as the Buchwald-Hartwig reaction,are essential in drug discovery for their pharmacological effects.Important descriptors are necessary for the estimation of yields in these reactions.This study explores ten metaheuristic algorithms for descriptor selection and model a voting ensemble for evaluation.The algorithms were evaluated based on computational time and the number of selected descriptors.Analyses show that robust performance is obtained with more descriptors,compared to cases where fewer descriptors are selected.The essential descriptor was deduced based on the frequency of occurrence within the 50 extracted data subsets,and better performance was achieved with the voting ensemble than other algorithms with RMSE of 6.4270 and R^(2) of 0.9423.The results and deductions from this study can be readily applied in the decision-making process of chemical synthesis by saving the computational cost associated with initial descriptor selection for yield estimation.The ensemble model has also shown robust performance in its yield estimation ability and efficiency.展开更多
文摘Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data collected supports the real-time decision-making required for diverse applications.The communication infrastructure relies on different network types,including the Internet.This makes the infrastructure vulnerable to various attacks,which could compromise security or have devastating effects.However,traditional machine learning solutions cannot adapt to the increasing complexity and diversity of attacks.The objective of this paper is to develop an Anomaly Detection System(ADS)based on deep learning using the CIC-IDS2017 dataset.However,this dataset is highly imbalanced;thus,a two-step sampling technique:random under-sampling and the Synthetic Minority Oversampling Technique(SMOTE),is proposed to balance the dataset.The proposed system utilizes a multiple hidden layer Auto-encoder(AE)for feature extraction and dimensional reduction.In addition,an ensemble voting based on both Random Forest(RF)and Convolu-tional Neural Network(CNN)is developed to classify the multiclass attack cate-gories.The proposed system is evaluated and compared with six different state-of-the-art machine learning and deep learning algorithms:Random Forest(RF),Light Gradient Boosting Machine(LightGBM),eXtreme Gradient Boosting(XGboost),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM),and bidirectional LSTM(biLSTM).Experimental results show that the proposed model enhances the detection for each attack class compared with the other machine learning and deep learning models with overall accuracy(98.29%),precision(99%),recall(98%),F_(1) score(98%),and the UNDetection rate(UND)(8%).
基金The work described in this paper was substantially supported by the grant from the Research Grants Council of the Hong Kong Special Administrative Region[CityU 11200218]one grant from the Health and Medical Research Fund,the Food and Health Bureau,The Government of the Hong Kong Special Administrative Region[07181426]+1 种基金and the funding from Hong Kong Institute for Data Science(HKIDS)at City University of Hong Kong.The work described in this paper was partially supported by two grants from City University of Hong Kong(CityU 11202219,CityU 11203520)This research was substantially sponsored by the research project(Grant No.32000464)supported by the National Natural Science Foundation of China and was substantially supported by the Shenzhen Research Institute,City University of Hong Kong.The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research with the project number(442/77).
文摘Bioactive compounds in plants,which can be synthesized using N-arylationmethods such as the Buchwald-Hartwig reaction,are essential in drug discovery for their pharmacological effects.Important descriptors are necessary for the estimation of yields in these reactions.This study explores ten metaheuristic algorithms for descriptor selection and model a voting ensemble for evaluation.The algorithms were evaluated based on computational time and the number of selected descriptors.Analyses show that robust performance is obtained with more descriptors,compared to cases where fewer descriptors are selected.The essential descriptor was deduced based on the frequency of occurrence within the 50 extracted data subsets,and better performance was achieved with the voting ensemble than other algorithms with RMSE of 6.4270 and R^(2) of 0.9423.The results and deductions from this study can be readily applied in the decision-making process of chemical synthesis by saving the computational cost associated with initial descriptor selection for yield estimation.The ensemble model has also shown robust performance in its yield estimation ability and efficiency.