Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL...Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.展开更多
BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling techn...BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.展开更多
Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbase...Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.展开更多
HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort...HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.展开更多
According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extrac...According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.展开更多
Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labv...Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labview and oversampling was implemented in Labview. And analog signal conditioning circuit was improved on. The result indicated that the system could detect ECG signal accurately with high signal-to-noise ratio and the signal processing methods could be adjusted easily. So the new system can satisfy many kinds of ECG acquisition. It is a flexible experiment platform for exploring new ECG acquisition methods.展开更多
Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machin...Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority.Therefore,the accuracy may be high,but the model cannot recognize data instances in the minority class to classify them,leading to many misclassifications.Different methods have been proposed in the literature to handle the imbalance problem,but most are complicated and tend to simulate unnecessary noise.In this paper,we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering,called GK-Means.The new method aims to avoid generating noise and control imbalances between and within classes.Various experiments have been carried out with six classifiers and four oversampling methods.Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.展开更多
An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by ...An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by pilot symbols and time domain filter.With frequency aliasing squared timing recovery with pilots,it is accessible to estimate timing error under oversampling rate less than 2.The time domain filter simultaneously performs matched-filtering and arbitrary interpolation.Because of pilot assisting,timing error estimation can be free from alias and self noise,so our method has good performance.Compared with traditional time-domain methods requiring oversampling rate above 2,this method can be adapted to any rational oversampling rate including less than 2.Moreover,compared with symbol synchronization in frequency domain which can operate under low oversampling rate,our method saves the complicated operation of conversion between time domain and frequency domain.By low oversampling rate and resource saving filter,this method is suitable for ultra-high-speed communication systems under resource-restricted hardware.The paper carries on the simulation and realization under 64QAM system.The simulation result shows that the loss is very low(less than 0.5 dB),and the real-time implementation on field programmable gate array(FPGA)also works fine.展开更多
A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital ...A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital converters(ADCs).To compensate for the per-formance loss caused by the coarse quantization,oversampling is applied at the receiver.The main challenge for the acquisition of cascaded channel state information in such a system is to handle the distortion caused by the 1-bit quantization and the sample correlation caused by oversampling.In this work,Bussgang decomposition is applied to deal with the coarse quantization,and a Markov chain is developed to char-acterize the banded structure of the oversampling filter.An approximate message-passing based algorithm is proposed for the estimation of the cascaded channels.Simulation results demonstrate that our proposed 1-bit systems with oversampling can approach the 2-bit systems in terms of the mean square error performance while the former consumes much less power at the receiver.展开更多
针对动态变化的信道环境,自适应正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)系统可以对子载波间隔和循环前缀长度进行调整,以最大化系统的吞吐量。为了能够快速准确地找到OFDM系统在不同信道环境中的最优子载波间...针对动态变化的信道环境,自适应正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)系统可以对子载波间隔和循环前缀长度进行调整,以最大化系统的吞吐量。为了能够快速准确地找到OFDM系统在不同信道环境中的最优子载波间隔和循环前缀长度取值,本文提出了基于随机森林的OFDM系统自适应算法。随机森林算法基于集成的思想,能够有效处理高维度数据,并且具有高效率、高准确率和强泛化能力等优势,可以在复杂的数据场景下进行有效的分类。通过提取通信过程中信噪比、用户移动速度、最大多普勒频率和均方根时延扩展等信道特征与OFDM系统的子载波间隔和循环前缀长度组成训练样本,利用随机森林算法创建了OFDM系统参数多分类模型。所提模型可以根据输入的信道特征,实现OFDM系统子载波间隔和循环前缀长度的自适应分配。同时,针对训练样本主要集中在少数几个系统参数类别的情况,利用合成少数类过采样技术对较少样本数的类别进行扩充,满足了随机森林算法对训练样本类别平衡化的需求,进一步提高了算法的分类准确率。相比传统的自适应算法,所提算法具有更高的分类准确率和模型泛化能力。分析和仿真结果表明,与子载波间隔和循环前缀长度固定的OFDM系统相比,本文所提出的自适应算法能够准确选择出最优的系统参数,可以有效地减轻信道中符号间干扰和子载波间干扰的影响,从而在整个信噪比范围上提供最大的平均频谱效率。基于随机森林的OFDM系统自适应算法能够动态地分配子载波间隔和循环前缀长度,增强OFDM系统的通信质量和抗干扰能力,实现在不同信道环境下的可靠传输。展开更多
文摘Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.
基金Supported by Discipline Advancement Program of Shanghai Fourth People’s Hospital,No.SY-XKZT-2020-2013.
文摘BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.
基金This work was supported by National Key R&D Program of China(Grant Numbers 2020YFB1005900,2022YFB3305802).
文摘Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.
文摘HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.
基金supported by the National Key Research and Development Program of China(2016YFB0500901)the Natural Science Foundation of Shanghai(18ZR1437200)the Satellite Mapping Technology and Application National Key Laboratory of Geographical Information Bureau(KLSMTA-201709)
文摘According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.
文摘Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labview and oversampling was implemented in Labview. And analog signal conditioning circuit was improved on. The result indicated that the system could detect ECG signal accurately with high signal-to-noise ratio and the signal processing methods could be adjusted easily. So the new system can satisfy many kinds of ECG acquisition. It is a flexible experiment platform for exploring new ECG acquisition methods.
文摘Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority.Therefore,the accuracy may be high,but the model cannot recognize data instances in the minority class to classify them,leading to many misclassifications.Different methods have been proposed in the literature to handle the imbalance problem,but most are complicated and tend to simulate unnecessary noise.In this paper,we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering,called GK-Means.The new method aims to avoid generating noise and control imbalances between and within classes.Various experiments have been carried out with six classifiers and four oversampling methods.Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.
文摘An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by pilot symbols and time domain filter.With frequency aliasing squared timing recovery with pilots,it is accessible to estimate timing error under oversampling rate less than 2.The time domain filter simultaneously performs matched-filtering and arbitrary interpolation.Because of pilot assisting,timing error estimation can be free from alias and self noise,so our method has good performance.Compared with traditional time-domain methods requiring oversampling rate above 2,this method can be adapted to any rational oversampling rate including less than 2.Moreover,compared with symbol synchronization in frequency domain which can operate under low oversampling rate,our method saves the complicated operation of conversion between time domain and frequency domain.By low oversampling rate and resource saving filter,this method is suitable for ultra-high-speed communication systems under resource-restricted hardware.The paper carries on the simulation and realization under 64QAM system.The simulation result shows that the loss is very low(less than 0.5 dB),and the real-time implementation on field programmable gate array(FPGA)also works fine.
文摘A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital converters(ADCs).To compensate for the per-formance loss caused by the coarse quantization,oversampling is applied at the receiver.The main challenge for the acquisition of cascaded channel state information in such a system is to handle the distortion caused by the 1-bit quantization and the sample correlation caused by oversampling.In this work,Bussgang decomposition is applied to deal with the coarse quantization,and a Markov chain is developed to char-acterize the banded structure of the oversampling filter.An approximate message-passing based algorithm is proposed for the estimation of the cascaded channels.Simulation results demonstrate that our proposed 1-bit systems with oversampling can approach the 2-bit systems in terms of the mean square error performance while the former consumes much less power at the receiver.
文摘针对动态变化的信道环境,自适应正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)系统可以对子载波间隔和循环前缀长度进行调整,以最大化系统的吞吐量。为了能够快速准确地找到OFDM系统在不同信道环境中的最优子载波间隔和循环前缀长度取值,本文提出了基于随机森林的OFDM系统自适应算法。随机森林算法基于集成的思想,能够有效处理高维度数据,并且具有高效率、高准确率和强泛化能力等优势,可以在复杂的数据场景下进行有效的分类。通过提取通信过程中信噪比、用户移动速度、最大多普勒频率和均方根时延扩展等信道特征与OFDM系统的子载波间隔和循环前缀长度组成训练样本,利用随机森林算法创建了OFDM系统参数多分类模型。所提模型可以根据输入的信道特征,实现OFDM系统子载波间隔和循环前缀长度的自适应分配。同时,针对训练样本主要集中在少数几个系统参数类别的情况,利用合成少数类过采样技术对较少样本数的类别进行扩充,满足了随机森林算法对训练样本类别平衡化的需求,进一步提高了算法的分类准确率。相比传统的自适应算法,所提算法具有更高的分类准确率和模型泛化能力。分析和仿真结果表明,与子载波间隔和循环前缀长度固定的OFDM系统相比,本文所提出的自适应算法能够准确选择出最优的系统参数,可以有效地减轻信道中符号间干扰和子载波间干扰的影响,从而在整个信噪比范围上提供最大的平均频谱效率。基于随机森林的OFDM系统自适应算法能够动态地分配子载波间隔和循环前缀长度,增强OFDM系统的通信质量和抗干扰能力,实现在不同信道环境下的可靠传输。