The development of defect prediction plays a significant role in improving software quality. Such predictions are used to identify defective modules before the testing and to minimize the time and cost. The software w...The development of defect prediction plays a significant role in improving software quality. Such predictions are used to identify defective modules before the testing and to minimize the time and cost. The software with defects negatively impacts operational costs and finally affects customer satisfaction. Numerous approaches exist to predict software defects. However, the timely and accurate software bugs are the major challenging issues. To improve the timely and accurate software defect prediction, a novel technique called Nonparametric Statistical feature scaled QuAdratic regressive convolution Deep nEural Network (SQADEN) is introduced. The proposed SQADEN technique mainly includes two major processes namely metric or feature selection and classification. First, the SQADEN uses the nonparametric statistical Torgerson–Gower scaling technique for identifying the relevant software metrics by measuring the similarity using the dice coefficient. The feature selection process is used to minimize the time complexity of software fault prediction. With the selected metrics, software fault perdition with the help of the Quadratic Censored regressive convolution deep neural network-based classification. The deep learning classifier analyzes the training and testing samples using the contingency correlation coefficient. The softstep activation function is used to provide the final fault prediction results. To minimize the error, the Nelder–Mead method is applied to solve non-linear least-squares problems. Finally, accurate classification results with a minimum error are obtained at the output layer. Experimental evaluation is carried out with different quantitative metrics such as accuracy, precision, recall, F-measure, and time complexity. The analyzed results demonstrate the superior performance of our proposed SQADEN technique with maximum accuracy, sensitivity and specificity by 3%, 3%, 2% and 3% and minimum time and space by 13% and 15% when compared with the two state-of-the-art methods.展开更多
Pre-harvest yield prediction of ratoon rice is critical for guiding crop interventions in precision agriculture.However,the unique agronomic practice(i.e.,varied stubble height treatment)in rice ratooning could lead t...Pre-harvest yield prediction of ratoon rice is critical for guiding crop interventions in precision agriculture.However,the unique agronomic practice(i.e.,varied stubble height treatment)in rice ratooning could lead to inconsistent rice phenology,which had a significant impact on yield prediction of ratoon rice.Multi-temporal unmanned aerial vehicle(UAV)-based remote sensing can likely monitor ratoon rice productivity and reflect maximum yield potential across growing seasons for improving the yield prediction compared with previous methods.Thus,in this study,we explored the performance of combination of agronomic practice information(API)and single-phase,multi-spectral features[vegetation indices(VIs)and texture(Tex)features]in predicting ratoon rice yield,and developed a new UAV-based method to retrieve yield formation process by using multi-temporal features which were effective in improving yield forecasting accuracy of ratoon rice.The results showed that the integrated use of VIs,Tex and API(VIs&Tex+API)improved the accuracy of yield prediction than single-phase UAV imagery-based feature,with the panicle initiation stage being the best period for yield prediction(R^(2) as 0.732,RMSE as 0.406,RRMSE as 0.101).More importantly,compared with previous multi-temporal UAV-based methods,our proposed multi-temporal method(multi-temporal model VIs&Tex:R^(2) as 0.795,RMSE as 0.298,RRMSE as 0.072)can increase R^(2) by 0.020-0.111 and decrease RMSE by 0.020-0.080 in crop yield forecasting.This study provides an effective method for accurate pre-harvest yield prediction of ratoon rice in precision agriculture,which is of great significance to take timely means for ensuring ratoon rice production and food security.展开更多
Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and stre...Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and strength of a nation’s economy.Conventionally,numerous approaches have been developed in the design of accurate FCP processes.At the same time,classifier efficacy and predictive accuracy are inadequate for real-time applications.In addition,several established techniques carry out well to any of the specific datasets but are not adjustable to distinct datasets.Thus,there is a necessity for developing an effectual prediction technique for optimum classifier performance and adjustable to various datasets.This paper presents a novel multi-vs.optimization(MVO)based feature selection(FS)with an optimal variational auto encoder(OVAE)model for FCP.The proposed multi-vs.optimization based feature selection with optimal variational auto encoder(MVOFS-OVAE)model mainly aims to accomplish forecasting the financial crisis.For achieving this,the proposed MVOFS-OVAE model primarily pre-processes the financial data using min-max normalization.In addition,the MVOFS-OVAE model designs a feature subset selection process using the MVOFS approach.Followed by,the variational auto encoder(VAE)model is applied for the categorization of financial data into financial crisis or non-financial crisis.Finally,the differential evolution(DE)algorithm is utilized for the parameter tuning of the VAE model.A series of simulations on the benchmark dataset reported the betterment of the MVOFS-OVAE approach over the recent state of art approaches.展开更多
Predicting disruptions across different tokamaks is necessary for next generation device.Future large-scale tokamaks can hardly tolerate disruptions at high performance discharge,which makes it difficult for current d...Predicting disruptions across different tokamaks is necessary for next generation device.Future large-scale tokamaks can hardly tolerate disruptions at high performance discharge,which makes it difficult for current data-driven methods to obtain an acceptable result.A machine learning method capable of transferring a disruption prediction model trained on one tokamak to another is required to solve the problem.The key is a feature extractor which is able to extract common disruption precursor traces in tokamak diagnostic data,and can be easily transferred to other tokamaks.Based on the concerns above,this paper presents a deep feature extractor,namely,the fusion feature extractor(FFE),which is designed specifically for extracting disruption precursor features from common diagnostics on tokamaks.Furthermore,an FFE-based disruption predictor on J-TEXT is demonstrated.The feature extractor is aimed to extracting disruption-related precursors and is designed according to the precursors of disruption and their representations in common tokamak diagnostics.Strong inductive bias on tokamak diagnostics data is introduced.The paper presents the evolution of the neural network feature extractor and its comparison against general deep neural networks,as well as a physics-based feature extraction with a traditional machine learning method.Results demonstrate that the FFE may reach a similar effect with physics-guided manual feature extraction,and obtain a better result compared with other deep learning methods.展开更多
Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches...Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches rely on static metrics or dynamic syntactic features,which have shown limited effectiveness in CPDP due to their inability to capture higher-level system properties,such as complex design patterns,relationships between multiple functions,and dependencies in different software projects,that are important for CPDP.This paper introduces a novel approach,a graph-based feature learning model for CPDP(GB-CPDP),that utilizes NetworkX to extract features and learn representations of program entities from control flow graphs(CFGs)and data dependency graphs(DDGs).These graphs capture the structural and data dependencies within the source code.The proposed approach employs Node2Vec to transform CFGs and DDGs into numerical vectors and leverages Long Short-Term Memory(LSTM)networks to learn predictive models.The process involves graph construction,feature learning through graph embedding and LSTM,and defect prediction.Experimental evaluation using nine open-source Java projects from the PROMISE dataset demonstrates that GB-CPDP outperforms state-of-the-art CPDP methods in terms of F1-measure and Area Under the Curve(AUC).The results showcase the effectiveness of GB-CPDP in improving the performance of cross-project defect prediction.展开更多
In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resoluti...In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resolution,single viewpoint,and occlusion.Different from the existing works predicting symmetry from the complete shape,we propose a learning approach for symmetry predic-tion based on a single RGB-D image.Instead of directly predicting the symmetry from incomplete shapes,our method consists of two modules,i.e.,the multi-mod-al feature fusion module and the detection-by-reconstruction module.Firstly,we build a channel-transformer network(CTN)to extract cross-fusion features from the RGB-D as the multi-modal feature fusion module,which helps us aggregate features from the color and the depth separately.Then,our self-reconstruction net-work based on a 3D variational auto-encoder(3D-VAE)takes the global geo-metric features as input,followed by a prediction symmetry network to detect the symmetry.Our experiments are conducted on three public datasets:ShapeNet,YCB,and ScanNet,we demonstrate that our method can produce reliable and accurate results.展开更多
This work constructed a machine learning(ML)model to predict the atmospheric corrosion rate of low-alloy steels(LAS).The material properties of LAS,environmental factors,and exposure time were used as the input,while ...This work constructed a machine learning(ML)model to predict the atmospheric corrosion rate of low-alloy steels(LAS).The material properties of LAS,environmental factors,and exposure time were used as the input,while the corrosion rate as the output.6 dif-ferent ML algorithms were used to construct the proposed model.Through optimization and filtering,the eXtreme gradient boosting(XG-Boost)model exhibited good corrosion rate prediction accuracy.The features of material properties were then transformed into atomic and physical features using the proposed property transformation approach,and the dominant descriptors that affected the corrosion rate were filtered using the recursive feature elimination(RFE)as well as XGBoost methods.The established ML models exhibited better predic-tion performance and generalization ability via property transformation descriptors.In addition,the SHapley additive exPlanations(SHAP)method was applied to analyze the relationship between the descriptors and corrosion rate.The results showed that the property transformation model could effectively help with analyzing the corrosion behavior,thereby significantly improving the generalization ability of corrosion rate prediction models.展开更多
Highway safety researchers focus on crash injury severity,utilizing deep learning—specifically,deep neural networks(DNN),deep convolutional neural networks(D-CNN),and deep recurrent neural networks(D-RNN)—as the pre...Highway safety researchers focus on crash injury severity,utilizing deep learning—specifically,deep neural networks(DNN),deep convolutional neural networks(D-CNN),and deep recurrent neural networks(D-RNN)—as the preferred method for modeling accident severity.Deep learning’s strength lies in handling intricate relation-ships within extensive datasets,making it popular for accident severity level(ASL)prediction and classification.Despite prior success,there is a need for an efficient system recognizing ASL in diverse road conditions.To address this,we present an innovative Accident Severity Level Prediction Deep Learning(ASLP-DL)framework,incorporating DNN,D-CNN,and D-RNN models fine-tuned through iterative hyperparameter selection with Stochastic Gradient Descent.The framework optimizes hidden layers and integrates data augmentation,Gaussian noise,and dropout regularization for improved generalization.Sensitivity and factor contribution analyses identify influential predictors.Evaluated on three diverse crash record databases—NCDB 2018–2019,UK 2015–2020,and US 2016–2021—the D-RNN model excels with an ACC score of 89.0281%,a Roc Area of 0.751,an F-estimate of 0.941,and a Kappa score of 0.0629 over the NCDB dataset.The proposed framework consistently outperforms traditional methods,existing machine learning,and deep learning techniques.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
Rock failure can cause serious geological disasters,and the non-extensive statistical features of electric potential(EP)are expected to provide valuable information for disaster prediction.In this paper,the uniaxial c...Rock failure can cause serious geological disasters,and the non-extensive statistical features of electric potential(EP)are expected to provide valuable information for disaster prediction.In this paper,the uniaxial compression experiments with EP monitoring were carried out on fine sandstone,marble and granite samples under four displacement rates.The Tsallis entropy q value of EPs is used to analyze the selforganization evolution of rock failure.Then the influence of displacement rate and rock type on q value are explored by mineral structure and fracture modes.A self-organized critical prediction method with q value is proposed.The results show that the probability density function(PDF)of EPs follows the q-Gaussian distribution.The displacement rate is positively correlated with q value.With the displacement rate increasing,the fracture mode changes,the damage degree intensifies,and the microcrack network becomes denser.The influence of rock type on q value is related to the burst intensity of energy release and the crack fracture mode.The q value of EPs can be used as an effective prediction index for rock failure like b value of acoustic emission(AE).The results provide useful reference and method for the monitoring and early warning of geological disasters.展开更多
Software project outcomes heavily depend on natural language requirements,often causing diverse interpretations and issues like ambiguities and incomplete or faulty requirements.Researchers are exploring machine learn...Software project outcomes heavily depend on natural language requirements,often causing diverse interpretations and issues like ambiguities and incomplete or faulty requirements.Researchers are exploring machine learning to predict software bugs,but a more precise and general approach is needed.Accurate bug prediction is crucial for software evolution and user training,prompting an investigation into deep and ensemble learning methods.However,these studies are not generalized and efficient when extended to other datasets.Therefore,this paper proposed a hybrid approach combining multiple techniques to explore their effectiveness on bug identification problems.The methods involved feature selection,which is used to reduce the dimensionality and redundancy of features and select only the relevant ones;transfer learning is used to train and test the model on different datasets to analyze how much of the learning is passed to other datasets,and ensemble method is utilized to explore the increase in performance upon combining multiple classifiers in a model.Four National Aeronautics and Space Administration(NASA)and four Promise datasets are used in the study,showing an increase in the model’s performance by providing better Area Under the Receiver Operating Characteristic Curve(AUC-ROC)values when different classifiers were combined.It reveals that using an amalgam of techniques such as those used in this study,feature selection,transfer learning,and ensemble methods prove helpful in optimizing the software bug prediction models and providing high-performing,useful end mode.展开更多
Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consi...Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consider linear correlations between features(indicators)of the source and target projects.These models are not capable of evaluating non-linear correlations between features when they exist,for example,when there are differences in data distributions between the source and target projects.As a result,the performance of such CPDP models is compromised.In this paper,this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique(SMOTE)and Deep Canonical Correlation Analysis(DCCA),referred to as S-DCCA.Canonical Correlation Analysis(CCA)is employed to address the issue of non-linear correlations between features of the source and target projects.S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset.The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function.Finally,cross-project defect prediction is achieved through the application of the SMOTE data sampling technique.Area Under Curve(AUC)and F1 scores(F1)are used as evaluation metrics.This paper conducted experiments on 27 projects from four public datasets to validate the proposed method.The results demonstrate that,on average,our method outperforms all baseline approaches by at least 1.2%in AUC and 5.5%in F1 score.This indicates that the proposed method exhibits favorable performance characteristics.展开更多
Medical Internet of Things(IoT)devices are becoming more and more common in healthcare.This has created a huge need for advanced predictive health modeling strategies that can make good use of the growing amount of mu...Medical Internet of Things(IoT)devices are becoming more and more common in healthcare.This has created a huge need for advanced predictive health modeling strategies that can make good use of the growing amount of multimodal data to find potential health risks early and help individuals in a personalized way.Existing methods,while useful,have limitations in predictive accuracy,delay,personalization,and user interpretability,requiring a more comprehensive and efficient approach to harness modern medical IoT devices.MAIPFE is a multimodal approach integrating pre-emptive analysis,personalized feature selection,and explainable AI for real-time health monitoring and disease detection.By using AI for early disease detection,personalized health recommendations,and transparency,healthcare will be transformed.The Multimodal Approach Integrating Pre-emptive Analysis,Personalized Feature Selection,and Explainable AI(MAIPFE)framework,which combines Firefly Optimizer,Recurrent Neural Network(RNN),Fuzzy C Means(FCM),and Explainable AI,improves disease detection precision over existing methods.Comprehensive metrics show the model’s superiority in real-time health analysis.The proposed framework outperformed existing models by 8.3%in disease detection classification precision,8.5%in accuracy,5.5%in recall,2.9%in specificity,4.5%in AUC(Area Under the Curve),and 4.9%in delay reduction.Disease prediction precision increased by 4.5%,accuracy by 3.9%,recall by 2.5%,specificity by 3.5%,AUC by 1.9%,and delay levels decreased by 9.4%.MAIPFE can revolutionize healthcare with preemptive analysis,personalized health insights,and actionable recommendations.The research shows that this innovative approach improves patient outcomes and healthcare efficiency in the real world.展开更多
BACKGROUND Acute pancreatitis in pregnancy(APIP)is a rare and serious condition,and severe APIP(SAPIP)can lead to pancreatic necrosis,abscess,multiple organ dysfunction,and other adverse maternal and infant outcomes.T...BACKGROUND Acute pancreatitis in pregnancy(APIP)is a rare and serious condition,and severe APIP(SAPIP)can lead to pancreatic necrosis,abscess,multiple organ dysfunction,and other adverse maternal and infant outcomes.Therefore,early identification or prediction of SAPIP is important.AIM To assess factors for early identification or prediction of SAPIP.METHODS The clinical data of patients with APIP were retrospectively analyzed.Patients were classified with mild acute pancreatitis or severe acute pancreatitis,and the clinical characteristics and laboratory biochemical indexes were compared between the two groups.Logical regression and receiver operating characteristic curve analyses were performed to assess the efficacy of the factors for identification or prediction of SAPIP.RESULTS A total of 45 APIP patients were enrolled.Compared with the mild acute pancreatitis group,the severe acute pancreatitis group had significantly increased(P<0.01)heart rate(HR),hemoglobin,neutrophil ratio(NEUT%),and neutrophil–lymphocyte ratio(NLR),while lymphocytes were significantly decreased(P<0.01).Logical regression analysis showed that HR,NEUT%,NLR,and lymphocyte count differed significantly(P<0.01)between the groups.These may be factors for early identification or prediction of SAPIP.The area under the curve of HR,NEUT%,NLR,and lymphocyte count in the receiver operating characteristic curve analysis was 0.748,0.732,0.821,and 0.774,respectively.The combined analysis showed that the area under the curve,sensitivity,and specificity were 0.869,90.5%,and 70.8%,respectively.CONCLUSION HR,NEUT%,NLR,and lymphocyte count can be used for early identification or prediction of SAPIP,and the combination of the four factors is expected to improve identification or prediction of SAPIP.展开更多
Reducing neonatal mortality is a critical global health objective,especially in resource-constrained developing countries.This study employs machine learning(ML)techniques to predict fetal health status based on cardi...Reducing neonatal mortality is a critical global health objective,especially in resource-constrained developing countries.This study employs machine learning(ML)techniques to predict fetal health status based on cardiotocography(CTG)examination findings,utilizing a dataset from the Kaggle repository due to the limited comprehensive healthcare data available in developing nations.Features such as baseline fetal heart rate,uterine contractions,and waveform characteristics were extracted using the RFE wrapper feature engineering technique and scaled with a standard scaler.Six ML models—Logistic Regression(LR),Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),Categorical Boosting(CB),and Extended Gradient Boosting(XGB)—are trained via cross-validation and evaluated using performance metrics.The developed models were trained via cross-validation and evaluated using ML performance metrics.Eight out of the 21 features selected by GB returned their maximum Matthews Correlation Coefficient(MCC)score of 0.6255,while CB,with 20 of the 21 features,returned the maximum and highest MCC score of 0.6321.The study demonstrated the ability of ML models to predict fetal health conditions from CTG exam results,facilitating early identification of high-risk pregnancies and enabling prompt treatment to prevent severe neonatal outcomes.展开更多
Dementia is a disorder with high societal impact and severe consequences for its patients who suffer from a progressive cognitive decline that leads to increased morbidity,mortality,and disabilities.Since there is a c...Dementia is a disorder with high societal impact and severe consequences for its patients who suffer from a progressive cognitive decline that leads to increased morbidity,mortality,and disabilities.Since there is a consensus that dementia is a multifactorial disorder,which portrays changes in the brain of the affected individual as early as 15 years before its onset,prediction models that aim at its early detection and risk identification should consider these characteristics.This study aims at presenting a novel method for ten years prediction of dementia using on multifactorial data,which comprised 75 variables.There are two automated diagnostic systems developed that use genetic algorithms for feature selection,while artificial neural network and deep neural network are used for dementia classification.The proposed model based on genetic algorithm and deep neural network had achieved the best accuracy of 93.36%,sensitivity of 93.15%,specificity of 91.59%,MCC of 0.4788,and performed superior to other 11 machine learning techniques which were presented in the past for dementia prediction.The identified best predictors were:age,past smoking habit,history of infarct,depression,hip fracture,single leg standing test with right leg,score in the physical component summary and history of TIA/RIND.The identification of risk factors is imperative in the dementia research as an effort to prevent or delay its onset.展开更多
In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making d...In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.展开更多
The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which...The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which may fail and affect the quality of service.Failure prediction is an important means of ensuring service availability.Predicting node failure in cloud-based data centers is challenging because the failure symptoms reflected have complex characteristics,and the distribution imbalance between the failure sample and the normal sample is widespread,resulting in inaccurate failure prediction.Targeting these challenges,this paper proposes a novel failure prediction method FP-STE(Failure Prediction based on Spatio-temporal Feature Extraction).Firstly,an improved recurrent neural network HW-GRU(Improved GRU based on HighWay network)and a convolutional neural network CNN are used to extract the temporal features and spatial features of multivariate data respectively to increase the discrimination of different types of failure symptoms which improves the accuracy of prediction.Then the intermediate results of the two models are added as features into SCSXGBoost to predict the possibility and the precise type of node failure in the future.SCS-XGBoost is an ensemble learning model that is improved by the integrated strategy of oversampling and cost-sensitive learning.Experimental results based on real data sets confirm the effectiveness and superiority of FP-STE.展开更多
In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different f...In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics.While the first experiments directly used the own stock features as the model inputs,the second experiments utilized reduced stock features through Variational AutoEncoders(VAE).In the last experiments,in order to grasp the effects of the other banking stocks on individual stock performance,the features belonging to other stocks were also given as inputs to our models.While combining other stock features was done for both own(named as allstock_own)and VAE-reduced(named as allstock_VAE)stock features,the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination.As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model,the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675.Although the classification results achieved with both feature types was close,allstock_VAE achieved these results using nearly 16.67%less features compared to allstock_own.When all experimental results were examined,it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features.It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.展开更多
Software defect prediction(SDP)is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects,so as to effectively predict defects in the new software.How...Software defect prediction(SDP)is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects,so as to effectively predict defects in the new software.However,there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors.In order to identify and remove the redundant and irrelevant features in software defect datasets,we propose ReliefF-based clustering(RFC),a clusterbased feature selection algorithm.Then,the correlation between features is calculated based on the symmetric uncertainty.According to the correlation degree,RFC partitions features into k clusters based on the k-medoids algorithm,and finally selects the representative features from each cluster to form the final feature subset.In the experiments,we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration(NASA)software defect prediction datasets in terms of area under curve(AUC)and Fvalue.The experimental results show that RFC can effectively improve the performance of SDP.展开更多
文摘The development of defect prediction plays a significant role in improving software quality. Such predictions are used to identify defective modules before the testing and to minimize the time and cost. The software with defects negatively impacts operational costs and finally affects customer satisfaction. Numerous approaches exist to predict software defects. However, the timely and accurate software bugs are the major challenging issues. To improve the timely and accurate software defect prediction, a novel technique called Nonparametric Statistical feature scaled QuAdratic regressive convolution Deep nEural Network (SQADEN) is introduced. The proposed SQADEN technique mainly includes two major processes namely metric or feature selection and classification. First, the SQADEN uses the nonparametric statistical Torgerson–Gower scaling technique for identifying the relevant software metrics by measuring the similarity using the dice coefficient. The feature selection process is used to minimize the time complexity of software fault prediction. With the selected metrics, software fault perdition with the help of the Quadratic Censored regressive convolution deep neural network-based classification. The deep learning classifier analyzes the training and testing samples using the contingency correlation coefficient. The softstep activation function is used to provide the final fault prediction results. To minimize the error, the Nelder–Mead method is applied to solve non-linear least-squares problems. Finally, accurate classification results with a minimum error are obtained at the output layer. Experimental evaluation is carried out with different quantitative metrics such as accuracy, precision, recall, F-measure, and time complexity. The analyzed results demonstrate the superior performance of our proposed SQADEN technique with maximum accuracy, sensitivity and specificity by 3%, 3%, 2% and 3% and minimum time and space by 13% and 15% when compared with the two state-of-the-art methods.
基金supported by the Key Research and Development Program of Heilongjiang,China(Grant No.2022ZX01A25)Cooperative Funding between Huazhong Agricultural University and Shenzhen Institute of Agricultural Genomics(Grant No.SZYJY2022014)+2 种基金Fundamental Research Funds for the Central Universities,Beijing,China(Grant Nos.2662022JC006 and 2662022ZHYJ002)National Natural Science Foundation of China(Grant No.32101819)Huazhong Agriculture University Research Startup Fund,China(Grant Nos.11041810340 and 11041810341).
文摘Pre-harvest yield prediction of ratoon rice is critical for guiding crop interventions in precision agriculture.However,the unique agronomic practice(i.e.,varied stubble height treatment)in rice ratooning could lead to inconsistent rice phenology,which had a significant impact on yield prediction of ratoon rice.Multi-temporal unmanned aerial vehicle(UAV)-based remote sensing can likely monitor ratoon rice productivity and reflect maximum yield potential across growing seasons for improving the yield prediction compared with previous methods.Thus,in this study,we explored the performance of combination of agronomic practice information(API)and single-phase,multi-spectral features[vegetation indices(VIs)and texture(Tex)features]in predicting ratoon rice yield,and developed a new UAV-based method to retrieve yield formation process by using multi-temporal features which were effective in improving yield forecasting accuracy of ratoon rice.The results showed that the integrated use of VIs,Tex and API(VIs&Tex+API)improved the accuracy of yield prediction than single-phase UAV imagery-based feature,with the panicle initiation stage being the best period for yield prediction(R^(2) as 0.732,RMSE as 0.406,RRMSE as 0.101).More importantly,compared with previous multi-temporal UAV-based methods,our proposed multi-temporal method(multi-temporal model VIs&Tex:R^(2) as 0.795,RMSE as 0.298,RRMSE as 0.072)can increase R^(2) by 0.020-0.111 and decrease RMSE by 0.020-0.080 in crop yield forecasting.This study provides an effective method for accurate pre-harvest yield prediction of ratoon rice in precision agriculture,which is of great significance to take timely means for ensuring ratoon rice production and food security.
文摘Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and strength of a nation’s economy.Conventionally,numerous approaches have been developed in the design of accurate FCP processes.At the same time,classifier efficacy and predictive accuracy are inadequate for real-time applications.In addition,several established techniques carry out well to any of the specific datasets but are not adjustable to distinct datasets.Thus,there is a necessity for developing an effectual prediction technique for optimum classifier performance and adjustable to various datasets.This paper presents a novel multi-vs.optimization(MVO)based feature selection(FS)with an optimal variational auto encoder(OVAE)model for FCP.The proposed multi-vs.optimization based feature selection with optimal variational auto encoder(MVOFS-OVAE)model mainly aims to accomplish forecasting the financial crisis.For achieving this,the proposed MVOFS-OVAE model primarily pre-processes the financial data using min-max normalization.In addition,the MVOFS-OVAE model designs a feature subset selection process using the MVOFS approach.Followed by,the variational auto encoder(VAE)model is applied for the categorization of financial data into financial crisis or non-financial crisis.Finally,the differential evolution(DE)algorithm is utilized for the parameter tuning of the VAE model.A series of simulations on the benchmark dataset reported the betterment of the MVOFS-OVAE approach over the recent state of art approaches.
基金Project supported by the National Key R&D Program of China (Grant No. 2022YFE03040004)the National Natural Science Foundation of China (Grant No. 51821005)
文摘Predicting disruptions across different tokamaks is necessary for next generation device.Future large-scale tokamaks can hardly tolerate disruptions at high performance discharge,which makes it difficult for current data-driven methods to obtain an acceptable result.A machine learning method capable of transferring a disruption prediction model trained on one tokamak to another is required to solve the problem.The key is a feature extractor which is able to extract common disruption precursor traces in tokamak diagnostic data,and can be easily transferred to other tokamaks.Based on the concerns above,this paper presents a deep feature extractor,namely,the fusion feature extractor(FFE),which is designed specifically for extracting disruption precursor features from common diagnostics on tokamaks.Furthermore,an FFE-based disruption predictor on J-TEXT is demonstrated.The feature extractor is aimed to extracting disruption-related precursors and is designed according to the precursors of disruption and their representations in common tokamak diagnostics.Strong inductive bias on tokamak diagnostics data is introduced.The paper presents the evolution of the neural network feature extractor and its comparison against general deep neural networks,as well as a physics-based feature extraction with a traditional machine learning method.Results demonstrate that the FFE may reach a similar effect with physics-guided manual feature extraction,and obtain a better result compared with other deep learning methods.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2022-00155885).
文摘Cross-project software defect prediction(CPDP)aims to enhance defect prediction in target projects with limited or no historical data by leveraging information from related source projects.The existing CPDP approaches rely on static metrics or dynamic syntactic features,which have shown limited effectiveness in CPDP due to their inability to capture higher-level system properties,such as complex design patterns,relationships between multiple functions,and dependencies in different software projects,that are important for CPDP.This paper introduces a novel approach,a graph-based feature learning model for CPDP(GB-CPDP),that utilizes NetworkX to extract features and learn representations of program entities from control flow graphs(CFGs)and data dependency graphs(DDGs).These graphs capture the structural and data dependencies within the source code.The proposed approach employs Node2Vec to transform CFGs and DDGs into numerical vectors and leverages Long Short-Term Memory(LSTM)networks to learn predictive models.The process involves graph construction,feature learning through graph embedding and LSTM,and defect prediction.Experimental evaluation using nine open-source Java projects from the PROMISE dataset demonstrates that GB-CPDP outperforms state-of-the-art CPDP methods in terms of F1-measure and Area Under the Curve(AUC).The results showcase the effectiveness of GB-CPDP in improving the performance of cross-project defect prediction.
文摘In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resolution,single viewpoint,and occlusion.Different from the existing works predicting symmetry from the complete shape,we propose a learning approach for symmetry predic-tion based on a single RGB-D image.Instead of directly predicting the symmetry from incomplete shapes,our method consists of two modules,i.e.,the multi-mod-al feature fusion module and the detection-by-reconstruction module.Firstly,we build a channel-transformer network(CTN)to extract cross-fusion features from the RGB-D as the multi-modal feature fusion module,which helps us aggregate features from the color and the depth separately.Then,our self-reconstruction net-work based on a 3D variational auto-encoder(3D-VAE)takes the global geo-metric features as input,followed by a prediction symmetry network to detect the symmetry.Our experiments are conducted on three public datasets:ShapeNet,YCB,and ScanNet,we demonstrate that our method can produce reliable and accurate results.
基金the National Key R&D Program of China(No.2021YFB3701705).
文摘This work constructed a machine learning(ML)model to predict the atmospheric corrosion rate of low-alloy steels(LAS).The material properties of LAS,environmental factors,and exposure time were used as the input,while the corrosion rate as the output.6 dif-ferent ML algorithms were used to construct the proposed model.Through optimization and filtering,the eXtreme gradient boosting(XG-Boost)model exhibited good corrosion rate prediction accuracy.The features of material properties were then transformed into atomic and physical features using the proposed property transformation approach,and the dominant descriptors that affected the corrosion rate were filtered using the recursive feature elimination(RFE)as well as XGBoost methods.The established ML models exhibited better predic-tion performance and generalization ability via property transformation descriptors.In addition,the SHapley additive exPlanations(SHAP)method was applied to analyze the relationship between the descriptors and corrosion rate.The results showed that the property transformation model could effectively help with analyzing the corrosion behavior,thereby significantly improving the generalization ability of corrosion rate prediction models.
文摘Highway safety researchers focus on crash injury severity,utilizing deep learning—specifically,deep neural networks(DNN),deep convolutional neural networks(D-CNN),and deep recurrent neural networks(D-RNN)—as the preferred method for modeling accident severity.Deep learning’s strength lies in handling intricate relation-ships within extensive datasets,making it popular for accident severity level(ASL)prediction and classification.Despite prior success,there is a need for an efficient system recognizing ASL in diverse road conditions.To address this,we present an innovative Accident Severity Level Prediction Deep Learning(ASLP-DL)framework,incorporating DNN,D-CNN,and D-RNN models fine-tuned through iterative hyperparameter selection with Stochastic Gradient Descent.The framework optimizes hidden layers and integrates data augmentation,Gaussian noise,and dropout regularization for improved generalization.Sensitivity and factor contribution analyses identify influential predictors.Evaluated on three diverse crash record databases—NCDB 2018–2019,UK 2015–2020,and US 2016–2021—the D-RNN model excels with an ACC score of 89.0281%,a Roc Area of 0.751,an F-estimate of 0.941,and a Kappa score of 0.0629 over the NCDB dataset.The proposed framework consistently outperforms traditional methods,existing machine learning,and deep learning techniques.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金supported by National Key R&D Program of China(2022YFC3004705)the National Natural Science Foundation of China(Nos.52074280,52227901 and 52204249)+1 种基金the Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX24_2913)the Graduate Innovation Program of China University of Mining and Technology(No.2024WLKXJ139).
文摘Rock failure can cause serious geological disasters,and the non-extensive statistical features of electric potential(EP)are expected to provide valuable information for disaster prediction.In this paper,the uniaxial compression experiments with EP monitoring were carried out on fine sandstone,marble and granite samples under four displacement rates.The Tsallis entropy q value of EPs is used to analyze the selforganization evolution of rock failure.Then the influence of displacement rate and rock type on q value are explored by mineral structure and fracture modes.A self-organized critical prediction method with q value is proposed.The results show that the probability density function(PDF)of EPs follows the q-Gaussian distribution.The displacement rate is positively correlated with q value.With the displacement rate increasing,the fracture mode changes,the damage degree intensifies,and the microcrack network becomes denser.The influence of rock type on q value is related to the burst intensity of energy release and the crack fracture mode.The q value of EPs can be used as an effective prediction index for rock failure like b value of acoustic emission(AE).The results provide useful reference and method for the monitoring and early warning of geological disasters.
基金This Research is funded by Researchers Supporting Project Number(RSPD2024R947),King Saud University,Riyadh,Saudi Arabia.
文摘Software project outcomes heavily depend on natural language requirements,often causing diverse interpretations and issues like ambiguities and incomplete or faulty requirements.Researchers are exploring machine learning to predict software bugs,but a more precise and general approach is needed.Accurate bug prediction is crucial for software evolution and user training,prompting an investigation into deep and ensemble learning methods.However,these studies are not generalized and efficient when extended to other datasets.Therefore,this paper proposed a hybrid approach combining multiple techniques to explore their effectiveness on bug identification problems.The methods involved feature selection,which is used to reduce the dimensionality and redundancy of features and select only the relevant ones;transfer learning is used to train and test the model on different datasets to analyze how much of the learning is passed to other datasets,and ensemble method is utilized to explore the increase in performance upon combining multiple classifiers in a model.Four National Aeronautics and Space Administration(NASA)and four Promise datasets are used in the study,showing an increase in the model’s performance by providing better Area Under the Receiver Operating Characteristic Curve(AUC-ROC)values when different classifiers were combined.It reveals that using an amalgam of techniques such as those used in this study,feature selection,transfer learning,and ensemble methods prove helpful in optimizing the software bug prediction models and providing high-performing,useful end mode.
基金NationalNatural Science Foundation of China,Grant/AwardNumber:61867004National Natural Science Foundation of China Youth Fund,Grant/Award Number:41801288.
文摘Cross-Project Defect Prediction(CPDP)is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project.However,existing CPDP methods only consider linear correlations between features(indicators)of the source and target projects.These models are not capable of evaluating non-linear correlations between features when they exist,for example,when there are differences in data distributions between the source and target projects.As a result,the performance of such CPDP models is compromised.In this paper,this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique(SMOTE)and Deep Canonical Correlation Analysis(DCCA),referred to as S-DCCA.Canonical Correlation Analysis(CCA)is employed to address the issue of non-linear correlations between features of the source and target projects.S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset.The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function.Finally,cross-project defect prediction is achieved through the application of the SMOTE data sampling technique.Area Under Curve(AUC)and F1 scores(F1)are used as evaluation metrics.This paper conducted experiments on 27 projects from four public datasets to validate the proposed method.The results demonstrate that,on average,our method outperforms all baseline approaches by at least 1.2%in AUC and 5.5%in F1 score.This indicates that the proposed method exhibits favorable performance characteristics.
文摘Medical Internet of Things(IoT)devices are becoming more and more common in healthcare.This has created a huge need for advanced predictive health modeling strategies that can make good use of the growing amount of multimodal data to find potential health risks early and help individuals in a personalized way.Existing methods,while useful,have limitations in predictive accuracy,delay,personalization,and user interpretability,requiring a more comprehensive and efficient approach to harness modern medical IoT devices.MAIPFE is a multimodal approach integrating pre-emptive analysis,personalized feature selection,and explainable AI for real-time health monitoring and disease detection.By using AI for early disease detection,personalized health recommendations,and transparency,healthcare will be transformed.The Multimodal Approach Integrating Pre-emptive Analysis,Personalized Feature Selection,and Explainable AI(MAIPFE)framework,which combines Firefly Optimizer,Recurrent Neural Network(RNN),Fuzzy C Means(FCM),and Explainable AI,improves disease detection precision over existing methods.Comprehensive metrics show the model’s superiority in real-time health analysis.The proposed framework outperformed existing models by 8.3%in disease detection classification precision,8.5%in accuracy,5.5%in recall,2.9%in specificity,4.5%in AUC(Area Under the Curve),and 4.9%in delay reduction.Disease prediction precision increased by 4.5%,accuracy by 3.9%,recall by 2.5%,specificity by 3.5%,AUC by 1.9%,and delay levels decreased by 9.4%.MAIPFE can revolutionize healthcare with preemptive analysis,personalized health insights,and actionable recommendations.The research shows that this innovative approach improves patient outcomes and healthcare efficiency in the real world.
文摘BACKGROUND Acute pancreatitis in pregnancy(APIP)is a rare and serious condition,and severe APIP(SAPIP)can lead to pancreatic necrosis,abscess,multiple organ dysfunction,and other adverse maternal and infant outcomes.Therefore,early identification or prediction of SAPIP is important.AIM To assess factors for early identification or prediction of SAPIP.METHODS The clinical data of patients with APIP were retrospectively analyzed.Patients were classified with mild acute pancreatitis or severe acute pancreatitis,and the clinical characteristics and laboratory biochemical indexes were compared between the two groups.Logical regression and receiver operating characteristic curve analyses were performed to assess the efficacy of the factors for identification or prediction of SAPIP.RESULTS A total of 45 APIP patients were enrolled.Compared with the mild acute pancreatitis group,the severe acute pancreatitis group had significantly increased(P<0.01)heart rate(HR),hemoglobin,neutrophil ratio(NEUT%),and neutrophil–lymphocyte ratio(NLR),while lymphocytes were significantly decreased(P<0.01).Logical regression analysis showed that HR,NEUT%,NLR,and lymphocyte count differed significantly(P<0.01)between the groups.These may be factors for early identification or prediction of SAPIP.The area under the curve of HR,NEUT%,NLR,and lymphocyte count in the receiver operating characteristic curve analysis was 0.748,0.732,0.821,and 0.774,respectively.The combined analysis showed that the area under the curve,sensitivity,and specificity were 0.869,90.5%,and 70.8%,respectively.CONCLUSION HR,NEUT%,NLR,and lymphocyte count can be used for early identification or prediction of SAPIP,and the combination of the four factors is expected to improve identification or prediction of SAPIP.
文摘Reducing neonatal mortality is a critical global health objective,especially in resource-constrained developing countries.This study employs machine learning(ML)techniques to predict fetal health status based on cardiotocography(CTG)examination findings,utilizing a dataset from the Kaggle repository due to the limited comprehensive healthcare data available in developing nations.Features such as baseline fetal heart rate,uterine contractions,and waveform characteristics were extracted using the RFE wrapper feature engineering technique and scaled with a standard scaler.Six ML models—Logistic Regression(LR),Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),Categorical Boosting(CB),and Extended Gradient Boosting(XGB)—are trained via cross-validation and evaluated using performance metrics.The developed models were trained via cross-validation and evaluated using ML performance metrics.Eight out of the 21 features selected by GB returned their maximum Matthews Correlation Coefficient(MCC)score of 0.6255,while CB,with 20 of the 21 features,returned the maximum and highest MCC score of 0.6321.The study demonstrated the ability of ML models to predict fetal health conditions from CTG exam results,facilitating early identification of high-risk pregnancies and enabling prompt treatment to prevent severe neonatal outcomes.
文摘Dementia is a disorder with high societal impact and severe consequences for its patients who suffer from a progressive cognitive decline that leads to increased morbidity,mortality,and disabilities.Since there is a consensus that dementia is a multifactorial disorder,which portrays changes in the brain of the affected individual as early as 15 years before its onset,prediction models that aim at its early detection and risk identification should consider these characteristics.This study aims at presenting a novel method for ten years prediction of dementia using on multifactorial data,which comprised 75 variables.There are two automated diagnostic systems developed that use genetic algorithms for feature selection,while artificial neural network and deep neural network are used for dementia classification.The proposed model based on genetic algorithm and deep neural network had achieved the best accuracy of 93.36%,sensitivity of 93.15%,specificity of 91.59%,MCC of 0.4788,and performed superior to other 11 machine learning techniques which were presented in the past for dementia prediction.The identified best predictors were:age,past smoking habit,history of infarct,depression,hip fracture,single leg standing test with right leg,score in the physical component summary and history of TIA/RIND.The identification of risk factors is imperative in the dementia research as an effort to prevent or delay its onset.
文摘In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.
基金supported in part by National Key Research and Development Program of China(2019YFB2103200)NSFC(61672108),Open Subject Funds of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory(SKX182010049)+1 种基金the Fundamental Research Funds for the Central Universities(5004193192019PTB-019)the Industrial Internet Innovation and Development Project 2018 of China.
文摘The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which may fail and affect the quality of service.Failure prediction is an important means of ensuring service availability.Predicting node failure in cloud-based data centers is challenging because the failure symptoms reflected have complex characteristics,and the distribution imbalance between the failure sample and the normal sample is widespread,resulting in inaccurate failure prediction.Targeting these challenges,this paper proposes a novel failure prediction method FP-STE(Failure Prediction based on Spatio-temporal Feature Extraction).Firstly,an improved recurrent neural network HW-GRU(Improved GRU based on HighWay network)and a convolutional neural network CNN are used to extract the temporal features and spatial features of multivariate data respectively to increase the discrimination of different types of failure symptoms which improves the accuracy of prediction.Then the intermediate results of the two models are added as features into SCSXGBoost to predict the possibility and the precise type of node failure in the future.SCS-XGBoost is an ensemble learning model that is improved by the integrated strategy of oversampling and cost-sensitive learning.Experimental results based on real data sets confirm the effectiveness and superiority of FP-STE.
文摘In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics.While the first experiments directly used the own stock features as the model inputs,the second experiments utilized reduced stock features through Variational AutoEncoders(VAE).In the last experiments,in order to grasp the effects of the other banking stocks on individual stock performance,the features belonging to other stocks were also given as inputs to our models.While combining other stock features was done for both own(named as allstock_own)and VAE-reduced(named as allstock_VAE)stock features,the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination.As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model,the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675.Although the classification results achieved with both feature types was close,allstock_VAE achieved these results using nearly 16.67%less features compared to allstock_own.When all experimental results were examined,it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features.It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.
基金supported by the National Key Research and Development Program of China(2018YFB1003702)the National Natural Science Foundation of China(62072255).
文摘Software defect prediction(SDP)is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects,so as to effectively predict defects in the new software.However,there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors.In order to identify and remove the redundant and irrelevant features in software defect datasets,we propose ReliefF-based clustering(RFC),a clusterbased feature selection algorithm.Then,the correlation between features is calculated based on the symmetric uncertainty.According to the correlation degree,RFC partitions features into k clusters based on the k-medoids algorithm,and finally selects the representative features from each cluster to form the final feature subset.In the experiments,we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration(NASA)software defect prediction datasets in terms of area under curve(AUC)and Fvalue.The experimental results show that RFC can effectively improve the performance of SDP.