Oversampling is commonly encountered in orthogonal frequency division multiplexing (OFDM) systems to ease various performance characteristics. In this paper, we investigate the performance and complexity of one tap ze...Oversampling is commonly encountered in orthogonal frequency division multiplexing (OFDM) systems to ease various performance characteristics. In this paper, we investigate the performance and complexity of one tap zero-forcing (ZF) and minimum mean-square error (MMSE) equalizers in oversampled OFDM systems. Theoretical analysis and simulation results show that oversampling not only reduces the noise at equalizer output but also helps mitigate ill effects of spectral nulls. One tap equalizers therefore yield improved symbol-error-rate (SER) performance with the increase in oversampling rate, but at the expense of increased system bandwidth and modest complexity requirements.展开更多
BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling techn...BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.展开更多
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL...Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.展开更多
Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is graduall...Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.展开更多
Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenom...Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.展开更多
In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wan...In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.展开更多
Deep learning offers a novel opportunity to achieve both high-quality and high-speed computer-generated holography(CGH).Current data-driven deep learning algorithms face the challenge that the labeled training dataset...Deep learning offers a novel opportunity to achieve both high-quality and high-speed computer-generated holography(CGH).Current data-driven deep learning algorithms face the challenge that the labeled training datasets limit the training performance and generalization.The model-driven deep learning introduces the diffraction model into the neural network.It eliminates the need for the labeled training dataset and has been extensively applied to hologram generation.However,the existing model-driven deep learning algorithms face the problem of insufficient constraints.In this study,we propose a model-driven neural network capable of high-fidelity 4K computer-generated hologram generation,called 4K Diffraction Model-driven Network(4K-DMDNet).The constraint of the reconstructed images in the frequency domain is strengthened.And a network structure that combines the residual method and sub-pixel convolution method is built,which effectively enhances the fitting ability of the network for inverse problems.The generalization of the 4K-DMDNet is demonstrated with binary,grayscale and 3D images.High-quality full-color optical reconstructions of the 4K holograms have been achieved at the wavelengths of 450 nm,520 nm,and 638 nm.展开更多
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba...Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.展开更多
Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbase...Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.展开更多
Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performa...Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.展开更多
Alzheimer’s disease(AD)is a neurodevelopmental impairment that results in a person’s behavior,thinking,and memory loss.Themost common symptoms ofADare losingmemory and early aging.In addition to these,there are seve...Alzheimer’s disease(AD)is a neurodevelopmental impairment that results in a person’s behavior,thinking,and memory loss.Themost common symptoms ofADare losingmemory and early aging.In addition to these,there are several serious impacts ofAD.However,the impact ofADcanbemitigatedby early-stagedetection though it cannot be cured permanently.Early-stage detection is the most challenging task for controlling and mitigating the impact of AD.The study proposes a predictive model to detect AD in the initial phase based on machine learning and a deep learning approach to address the issue.To build a predictive model,open-source data was collected where five stages of images of AD were available as Cognitive Normal(CN),Early Mild Cognitive Impairment(EMCI),Mild Cognitive Impairment(MCI),Late Mild Cognitive Impairment(LMCI),and AD.Every stage of AD is considered as a class,and then the dataset was divided into three parts binary class,three class,and five class.In this research,we applied different preprocessing steps with augmentation techniques to efficiently identifyAD.It integrates a random oversampling technique to handle the imbalance problem from target classes,mitigating the model overfitting and biases.Then three machine learning classifiers,such as random forest(RF),K-Nearest neighbor(KNN),and support vector machine(SVM),and two deep learning methods,such as convolutional neuronal network(CNN)and artificial neural network(ANN)were applied on these datasets.After analyzing the performance of the used models and the datasets,it is found that CNN with binary class outperformed 88.20%accuracy.The result of the study indicates that the model is highly potential to detect AD in the initial phase.展开更多
During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground sam...During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground samples and the information is subjective,heterogeneous,and imbalanced due to mixed ground conditions.In this study,an unsupervised(K-means)and synthetic minority oversampling technique(SMOTE)-guided light-gradient boosting machine(LightGBM)classifier is proposed to identify the soft ground tunnel classification and determine the imbalanced issue of tunnelling data.During the tunnel excavation,an earth pressure balance(EPB)TBM recorded 18 different operational parameters along with the three main tunnel lithologies.The proposed model is applied using Python low-code PyCaret library.Next,four decision tree-based classifiers were obtained in a short time period with automatic hyperparameter tuning to determine the best model for clustering-guided SMOTE application.In addition,the Shapley additive explanation(SHAP)was implemented to avoid the model black box problem.The proposed model was evaluated using different metrics such as accuracy,F1 score,precision,recall,and receiver operating characteristics(ROC)curve to obtain a reasonable outcome for the minority class.It shows that the proposed model can provide significant tunnel lithology identification based on the operational parameters of EPB-TBM.The proposed method can be applied to heterogeneous tunnel formations with several TBM operational parameters to describe the tunnel lithologies for efficient tunnelling.展开更多
Twitter has emerged as a platform that produces new data every day through its users which can be utilized for various purposes.People express their unique ideas and views onmultiple topics thus providing vast knowled...Twitter has emerged as a platform that produces new data every day through its users which can be utilized for various purposes.People express their unique ideas and views onmultiple topics thus providing vast knowledge.Sentiment analysis is critical from the corporate and political perspectives as it can impact decision-making.Since the proliferation of COVID-19,it has become an important challenge to detect the sentiment of COVID-19-related tweets so that people’s opinions can be tracked.The purpose of this research is to detect the sentiment of people regarding this problem with limited data as it can be challenging considering the various textual characteristics that must be analyzed.Hence,this research presents a deep learning-based model that utilizes the positives of random minority oversampling combined with class label analysis to achieve the best results for sentiment analysis.This research specifically focuses on utilizing class label analysis to deal with the multiclass problem by combining the class labels with a similar overall sentiment.This can be particularly helpful when dealing with smaller datasets.Furthermore,our proposed model integrates various preprocessing steps with random minority oversampling and various deep learning algorithms including standard deep learning and bi-directional deep learning algorithms.This research explores several algorithms and their impact on sentiment analysis tasks and concludes that bidirectional neural networks do not provide any advantage over standard neural networks as standard Neural Networks provide slightly better results than their bidirectional counterparts.The experimental results validate that our model offers excellent results with a validation accuracy of 92.5%and an F1 measure of 0.92.展开更多
This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance rank...This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.展开更多
HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort...HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.展开更多
This study aims to develop a low-cost refractometer for measuring the sucrose content of fruit juice,which is an important factor affecting human health.While laboratory-grade refractometers are expensive and unsuitab...This study aims to develop a low-cost refractometer for measuring the sucrose content of fruit juice,which is an important factor affecting human health.While laboratory-grade refractometers are expensive and unsuitable for personal use,existing low-cost commercial options lack stability and accuracy.To address this gap,we propose a refractometer that replaces the expensive CCD sensor and light source with a conventional LED and a reasonably priced CMOS sensor.By analyzing the output waveform pattern of the CMOS sensor,we achieve high precision with a personal-use-appropriate accuracy of 0.1%.We tested the proposed refractometer by conducting 100 repeated measurements on various fruit juice samples,and the results demonstrate its reliability and consistency.Running on a 48 MHz ARM processor,the algorithm can acquire data within 0.2 seconds.Our low-cost refractometer is suitable for personal health management and small-scale production,providing an affordable and reliable method for measuring sucrose concentration in fruit juice.It improves upon the existing low-cost options by offering better stability and accuracy.This accessible tool has potential applications in optimizing the sucrose content of fruit juice for better health and quality control.展开更多
According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extrac...According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
文摘Oversampling is commonly encountered in orthogonal frequency division multiplexing (OFDM) systems to ease various performance characteristics. In this paper, we investigate the performance and complexity of one tap zero-forcing (ZF) and minimum mean-square error (MMSE) equalizers in oversampled OFDM systems. Theoretical analysis and simulation results show that oversampling not only reduces the noise at equalizer output but also helps mitigate ill effects of spectral nulls. One tap equalizers therefore yield improved symbol-error-rate (SER) performance with the increase in oversampling rate, but at the expense of increased system bandwidth and modest complexity requirements.
基金Supported by Discipline Advancement Program of Shanghai Fourth People’s Hospital,No.SY-XKZT-2020-2013.
文摘BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.
文摘Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.
基金Project(52161135301)supported by the International Cooperation and Exchange of the National Natural Science Foundation of ChinaProject(202306370296)supported by China Scholarship Council。
文摘Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.
文摘Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.
文摘In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.
基金We are grateful for financial supports from National Natural Science Foundation of China(62035003,61775117)China Postdoctoral Science Foundation(BX2021140)Tsinghua University Initiative Scientific Research Program(20193080075).
文摘Deep learning offers a novel opportunity to achieve both high-quality and high-speed computer-generated holography(CGH).Current data-driven deep learning algorithms face the challenge that the labeled training datasets limit the training performance and generalization.The model-driven deep learning introduces the diffraction model into the neural network.It eliminates the need for the labeled training dataset and has been extensively applied to hologram generation.However,the existing model-driven deep learning algorithms face the problem of insufficient constraints.In this study,we propose a model-driven neural network capable of high-fidelity 4K computer-generated hologram generation,called 4K Diffraction Model-driven Network(4K-DMDNet).The constraint of the reconstructed images in the frequency domain is strengthened.And a network structure that combines the residual method and sub-pixel convolution method is built,which effectively enhances the fitting ability of the network for inverse problems.The generalization of the 4K-DMDNet is demonstrated with binary,grayscale and 3D images.High-quality full-color optical reconstructions of the 4K holograms have been achieved at the wavelengths of 450 nm,520 nm,and 638 nm.
文摘Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.
基金This work was supported by National Key R&D Program of China(Grant Numbers 2020YFB1005900,2022YFB3305802).
文摘Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.
基金supported by the National Key R&D Program of China(No.2021YFC2100100)the National Natural Science Foundation of China(No.21901157)the Shanghai Science and Technology Project(No.21JC1403400)。
文摘Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.
基金funded in part by the Natural Sciences and Engineering Research Council of Canada(NSERC)through Project Number:IFP22UQU4170008DSR0056.
文摘Alzheimer’s disease(AD)is a neurodevelopmental impairment that results in a person’s behavior,thinking,and memory loss.Themost common symptoms ofADare losingmemory and early aging.In addition to these,there are several serious impacts ofAD.However,the impact ofADcanbemitigatedby early-stagedetection though it cannot be cured permanently.Early-stage detection is the most challenging task for controlling and mitigating the impact of AD.The study proposes a predictive model to detect AD in the initial phase based on machine learning and a deep learning approach to address the issue.To build a predictive model,open-source data was collected where five stages of images of AD were available as Cognitive Normal(CN),Early Mild Cognitive Impairment(EMCI),Mild Cognitive Impairment(MCI),Late Mild Cognitive Impairment(LMCI),and AD.Every stage of AD is considered as a class,and then the dataset was divided into three parts binary class,three class,and five class.In this research,we applied different preprocessing steps with augmentation techniques to efficiently identifyAD.It integrates a random oversampling technique to handle the imbalance problem from target classes,mitigating the model overfitting and biases.Then three machine learning classifiers,such as random forest(RF),K-Nearest neighbor(KNN),and support vector machine(SVM),and two deep learning methods,such as convolutional neuronal network(CNN)and artificial neural network(ANN)were applied on these datasets.After analyzing the performance of the used models and the datasets,it is found that CNN with binary class outperformed 88.20%accuracy.The result of the study indicates that the model is highly potential to detect AD in the initial phase.
基金supported by Japan Society for the Promotion of Science KAKENHI(Grant No.JP22H01580).
文摘During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground samples and the information is subjective,heterogeneous,and imbalanced due to mixed ground conditions.In this study,an unsupervised(K-means)and synthetic minority oversampling technique(SMOTE)-guided light-gradient boosting machine(LightGBM)classifier is proposed to identify the soft ground tunnel classification and determine the imbalanced issue of tunnelling data.During the tunnel excavation,an earth pressure balance(EPB)TBM recorded 18 different operational parameters along with the three main tunnel lithologies.The proposed model is applied using Python low-code PyCaret library.Next,four decision tree-based classifiers were obtained in a short time period with automatic hyperparameter tuning to determine the best model for clustering-guided SMOTE application.In addition,the Shapley additive explanation(SHAP)was implemented to avoid the model black box problem.The proposed model was evaluated using different metrics such as accuracy,F1 score,precision,recall,and receiver operating characteristics(ROC)curve to obtain a reasonable outcome for the minority class.It shows that the proposed model can provide significant tunnel lithology identification based on the operational parameters of EPB-TBM.The proposed method can be applied to heterogeneous tunnel formations with several TBM operational parameters to describe the tunnel lithologies for efficient tunnelling.
基金This work was funded by the Deanship of Scientific Research at Jouf University under Grant Number(DSR2022-RG-0105).
文摘Twitter has emerged as a platform that produces new data every day through its users which can be utilized for various purposes.People express their unique ideas and views onmultiple topics thus providing vast knowledge.Sentiment analysis is critical from the corporate and political perspectives as it can impact decision-making.Since the proliferation of COVID-19,it has become an important challenge to detect the sentiment of COVID-19-related tweets so that people’s opinions can be tracked.The purpose of this research is to detect the sentiment of people regarding this problem with limited data as it can be challenging considering the various textual characteristics that must be analyzed.Hence,this research presents a deep learning-based model that utilizes the positives of random minority oversampling combined with class label analysis to achieve the best results for sentiment analysis.This research specifically focuses on utilizing class label analysis to deal with the multiclass problem by combining the class labels with a similar overall sentiment.This can be particularly helpful when dealing with smaller datasets.Furthermore,our proposed model integrates various preprocessing steps with random minority oversampling and various deep learning algorithms including standard deep learning and bi-directional deep learning algorithms.This research explores several algorithms and their impact on sentiment analysis tasks and concludes that bidirectional neural networks do not provide any advantage over standard neural networks as standard Neural Networks provide slightly better results than their bidirectional counterparts.The experimental results validate that our model offers excellent results with a validation accuracy of 92.5%and an F1 measure of 0.92.
文摘This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.
文摘HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.
文摘This study aims to develop a low-cost refractometer for measuring the sucrose content of fruit juice,which is an important factor affecting human health.While laboratory-grade refractometers are expensive and unsuitable for personal use,existing low-cost commercial options lack stability and accuracy.To address this gap,we propose a refractometer that replaces the expensive CCD sensor and light source with a conventional LED and a reasonably priced CMOS sensor.By analyzing the output waveform pattern of the CMOS sensor,we achieve high precision with a personal-use-appropriate accuracy of 0.1%.We tested the proposed refractometer by conducting 100 repeated measurements on various fruit juice samples,and the results demonstrate its reliability and consistency.Running on a 48 MHz ARM processor,the algorithm can acquire data within 0.2 seconds.Our low-cost refractometer is suitable for personal health management and small-scale production,providing an affordable and reliable method for measuring sucrose concentration in fruit juice.It improves upon the existing low-cost options by offering better stability and accuracy.This accessible tool has potential applications in optimizing the sucrose content of fruit juice for better health and quality control.
基金supported by the National Key Research and Development Program of China(2016YFB0500901)the Natural Science Foundation of Shanghai(18ZR1437200)the Satellite Mapping Technology and Application National Key Laboratory of Geographical Information Bureau(KLSMTA-201709)
文摘According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.