BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling techn...BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.展开更多
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL...Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.展开更多
Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is graduall...Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.展开更多
Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbase...Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.展开更多
HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort...HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.展开更多
According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extrac...According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.展开更多
Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machin...Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority.Therefore,the accuracy may be high,but the model cannot recognize data instances in the minority class to classify them,leading to many misclassifications.Different methods have been proposed in the literature to handle the imbalance problem,but most are complicated and tend to simulate unnecessary noise.In this paper,we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering,called GK-Means.The new method aims to avoid generating noise and control imbalances between and within classes.Various experiments have been carried out with six classifiers and four oversampling methods.Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.展开更多
An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by ...An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by pilot symbols and time domain filter.With frequency aliasing squared timing recovery with pilots,it is accessible to estimate timing error under oversampling rate less than 2.The time domain filter simultaneously performs matched-filtering and arbitrary interpolation.Because of pilot assisting,timing error estimation can be free from alias and self noise,so our method has good performance.Compared with traditional time-domain methods requiring oversampling rate above 2,this method can be adapted to any rational oversampling rate including less than 2.Moreover,compared with symbol synchronization in frequency domain which can operate under low oversampling rate,our method saves the complicated operation of conversion between time domain and frequency domain.By low oversampling rate and resource saving filter,this method is suitable for ultra-high-speed communication systems under resource-restricted hardware.The paper carries on the simulation and realization under 64QAM system.The simulation result shows that the loss is very low(less than 0.5 dB),and the real-time implementation on field programmable gate array(FPGA)also works fine.展开更多
Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labv...Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labview and oversampling was implemented in Labview. And analog signal conditioning circuit was improved on. The result indicated that the system could detect ECG signal accurately with high signal-to-noise ratio and the signal processing methods could be adjusted easily. So the new system can satisfy many kinds of ECG acquisition. It is a flexible experiment platform for exploring new ECG acquisition methods.展开更多
A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital ...A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital converters(ADCs).To compensate for the per-formance loss caused by the coarse quantization,oversampling is applied at the receiver.The main challenge for the acquisition of cascaded channel state information in such a system is to handle the distortion caused by the 1-bit quantization and the sample correlation caused by oversampling.In this work,Bussgang decomposition is applied to deal with the coarse quantization,and a Markov chain is developed to char-acterize the banded structure of the oversampling filter.An approximate message-passing based algorithm is proposed for the estimation of the cascaded channels.Simulation results demonstrate that our proposed 1-bit systems with oversampling can approach the 2-bit systems in terms of the mean square error performance while the former consumes much less power at the receiver.展开更多
In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wan...In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.展开更多
Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenom...Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.展开更多
Oversampling is widely used in practical applications of digital signal processing. As the fractional Fourier transform has been developed and applied in signal processing fields, it is necessary to consider the overs...Oversampling is widely used in practical applications of digital signal processing. As the fractional Fourier transform has been developed and applied in signal processing fields, it is necessary to consider the oversampling theorem in the fractional Fourier domain. In this paper, the oversampling theorem in the fractional Fourier domain is analyzed. The fractional Fourier spectral relation between the original oversampled sequence and its subsequences is derived first, and then the expression for exact reconstruction of the missing samples in terms of the subsequences is obtained. Moreover, by taking a chirp signal as an example, it is shown that, reconstruction of the missing samples in the oversampled signal is suitable in the fractional Fourier domain for the signal whose time-frequency distribution has the minimum support in the fractional Fourier domain.展开更多
Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, ov...Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling (WPO) technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperforms many prevalent imbalance learning solutions.展开更多
Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scena...Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios(OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpec Miner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.展开更多
Deep learning offers a novel opportunity to achieve both high-quality and high-speed computer-generated holography(CGH).Current data-driven deep learning algorithms face the challenge that the labeled training dataset...Deep learning offers a novel opportunity to achieve both high-quality and high-speed computer-generated holography(CGH).Current data-driven deep learning algorithms face the challenge that the labeled training datasets limit the training performance and generalization.The model-driven deep learning introduces the diffraction model into the neural network.It eliminates the need for the labeled training dataset and has been extensively applied to hologram generation.However,the existing model-driven deep learning algorithms face the problem of insufficient constraints.In this study,we propose a model-driven neural network capable of high-fidelity 4K computer-generated hologram generation,called 4K Diffraction Model-driven Network(4K-DMDNet).The constraint of the reconstructed images in the frequency domain is strengthened.And a network structure that combines the residual method and sub-pixel convolution method is built,which effectively enhances the fitting ability of the network for inverse problems.The generalization of the 4K-DMDNet is demonstrated with binary,grayscale and 3D images.High-quality full-color optical reconstructions of the 4K holograms have been achieved at the wavelengths of 450 nm,520 nm,and 638 nm.展开更多
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba...Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.展开更多
Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performa...Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.展开更多
基金Supported by Discipline Advancement Program of Shanghai Fourth People’s Hospital,No.SY-XKZT-2020-2013.
文摘BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.
文摘Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.
基金Project(52161135301)supported by the International Cooperation and Exchange of the National Natural Science Foundation of ChinaProject(202306370296)supported by China Scholarship Council。
文摘Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.
基金This work was supported by National Key R&D Program of China(Grant Numbers 2020YFB1005900,2022YFB3305802).
文摘Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.
文摘HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.
基金supported by the National Key Research and Development Program of China(2016YFB0500901)the Natural Science Foundation of Shanghai(18ZR1437200)the Satellite Mapping Technology and Application National Key Laboratory of Geographical Information Bureau(KLSMTA-201709)
文摘According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.
文摘Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority.Therefore,the accuracy may be high,but the model cannot recognize data instances in the minority class to classify them,leading to many misclassifications.Different methods have been proposed in the literature to handle the imbalance problem,but most are complicated and tend to simulate unnecessary noise.In this paper,we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering,called GK-Means.The new method aims to avoid generating noise and control imbalances between and within classes.Various experiments have been carried out with six classifiers and four oversampling methods.Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.
文摘An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by pilot symbols and time domain filter.With frequency aliasing squared timing recovery with pilots,it is accessible to estimate timing error under oversampling rate less than 2.The time domain filter simultaneously performs matched-filtering and arbitrary interpolation.Because of pilot assisting,timing error estimation can be free from alias and self noise,so our method has good performance.Compared with traditional time-domain methods requiring oversampling rate above 2,this method can be adapted to any rational oversampling rate including less than 2.Moreover,compared with symbol synchronization in frequency domain which can operate under low oversampling rate,our method saves the complicated operation of conversion between time domain and frequency domain.By low oversampling rate and resource saving filter,this method is suitable for ultra-high-speed communication systems under resource-restricted hardware.The paper carries on the simulation and realization under 64QAM system.The simulation result shows that the loss is very low(less than 0.5 dB),and the real-time implementation on field programmable gate array(FPGA)also works fine.
文摘Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labview and oversampling was implemented in Labview. And analog signal conditioning circuit was improved on. The result indicated that the system could detect ECG signal accurately with high signal-to-noise ratio and the signal processing methods could be adjusted easily. So the new system can satisfy many kinds of ECG acquisition. It is a flexible experiment platform for exploring new ECG acquisition methods.
文摘A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital converters(ADCs).To compensate for the per-formance loss caused by the coarse quantization,oversampling is applied at the receiver.The main challenge for the acquisition of cascaded channel state information in such a system is to handle the distortion caused by the 1-bit quantization and the sample correlation caused by oversampling.In this work,Bussgang decomposition is applied to deal with the coarse quantization,and a Markov chain is developed to char-acterize the banded structure of the oversampling filter.An approximate message-passing based algorithm is proposed for the estimation of the cascaded channels.Simulation results demonstrate that our proposed 1-bit systems with oversampling can approach the 2-bit systems in terms of the mean square error performance while the former consumes much less power at the receiver.
文摘In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.
文摘Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.
基金Supported partially by the National Natural Science Foundation of China for Distinguished Young Scholars (Grant No. 60625104)the National Natural Science Foundation of China (Grant Nos. 60890072, 60572094)the National Basic Research Program of China (Grant No.2009CB724003)
文摘Oversampling is widely used in practical applications of digital signal processing. As the fractional Fourier transform has been developed and applied in signal processing fields, it is necessary to consider the oversampling theorem in the fractional Fourier domain. In this paper, the oversampling theorem in the fractional Fourier domain is analyzed. The fractional Fourier spectral relation between the original oversampled sequence and its subsequences is derived first, and then the expression for exact reconstruction of the missing samples in terms of the subsequences is obtained. Moreover, by taking a chirp signal as an example, it is shown that, reconstruction of the missing samples in the oversampled signal is suitable in the fractional Fourier domain for the signal whose time-frequency distribution has the minimum support in the fractional Fourier domain.
基金Acknowledgements This research was partially supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06030200), the National Natural Science Foundation of China (Grant Nos. M1552006, 61403369, 61272427, and 61363030), Xinjiang Uygur Autonomous Region Science and Technology Project (201230123), Beijing Key Lab of Intelligent Telecommunication Software, Multimedia (ITSM201502), Guangxi Key Laboratory of Trusted Software (kx201418).
文摘Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling (WPO) technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperforms many prevalent imbalance learning solutions.
基金supported by the Scientific Research Project of the Education Department of Hubei Province,China(No.Q20181508)the Youths Science Foundation of Wuhan Institute of Technology(No.k201622)+5 种基金the Surveying and Mapping Geographic Information Public Welfare Scientific Research Special Industry(No.201412014)the Educational Commission of Hubei Province,China(No.Q20151504)the National Natural Science Foundation of China(Nos.41501505,61502355,61502355,and 61502354)the China Postdoctoral Science Foundation(No.2015M581887)the Key Program of Higher Education Institutions of Henan Province,China(No.17A520040)and the Natural Science Foundation of Henan Province,China(No.162300410177)
文摘Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios(OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpec Miner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.
基金We are grateful for financial supports from National Natural Science Foundation of China(62035003,61775117)China Postdoctoral Science Foundation(BX2021140)Tsinghua University Initiative Scientific Research Program(20193080075).
文摘Deep learning offers a novel opportunity to achieve both high-quality and high-speed computer-generated holography(CGH).Current data-driven deep learning algorithms face the challenge that the labeled training datasets limit the training performance and generalization.The model-driven deep learning introduces the diffraction model into the neural network.It eliminates the need for the labeled training dataset and has been extensively applied to hologram generation.However,the existing model-driven deep learning algorithms face the problem of insufficient constraints.In this study,we propose a model-driven neural network capable of high-fidelity 4K computer-generated hologram generation,called 4K Diffraction Model-driven Network(4K-DMDNet).The constraint of the reconstructed images in the frequency domain is strengthened.And a network structure that combines the residual method and sub-pixel convolution method is built,which effectively enhances the fitting ability of the network for inverse problems.The generalization of the 4K-DMDNet is demonstrated with binary,grayscale and 3D images.High-quality full-color optical reconstructions of the 4K holograms have been achieved at the wavelengths of 450 nm,520 nm,and 638 nm.
文摘Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.
基金supported by the National Key R&D Program of China(No.2021YFC2100100)the National Natural Science Foundation of China(No.21901157)the Shanghai Science and Technology Project(No.21JC1403400)。
文摘Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.