Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentat...Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.展开更多
Invoice document digitization is crucial for efficient management in industries.The scanned invoice image is often noisy due to various reasons.This affects the OCR(optical character recognition)detection accuracy.In ...Invoice document digitization is crucial for efficient management in industries.The scanned invoice image is often noisy due to various reasons.This affects the OCR(optical character recognition)detection accuracy.In this paper,letter data obtained from images of invoices are denoised using a modified autoencoder based deep learning method.A stacked denoising autoencoder(SDAE)is implemented with two hidden layers each in encoder network and decoder network.In order to capture the most salient features of training samples,a undercomplete autoencoder is designed with non-linear encoder and decoder function.This autoencoder is regularized for denoising application using a combined loss function which considers both mean square error and binary cross entropy.A dataset consisting of 59,119 letter images,which contains both English alphabets(upper and lower case)and numbers(0 to 9)is prepared from many scanned invoices images and windows true type(.ttf)files,are used for training the neural network.Performance is analyzed in terms of Signal to Noise Ratio(SNR),Peak Signal to Noise Ratio(PSNR),Structural Similarity Index(SSIM)and Universal Image Quality Index(UQI)and compared with other filtering techniques like Nonlocal Means filter,Anisotropic diffusion filter,Gaussian filters and Mean filters.Denoising performance of proposed SDAE is compared with existing SDAE with single loss function in terms of SNR and PSNR values.Results show the superior performance of proposed SDAE method.展开更多
Time series analysis is a key technology for medical diagnosis,weather forecasting and financial prediction systems.However,missing data frequently occur during data recording,posing a great challenge to data mining t...Time series analysis is a key technology for medical diagnosis,weather forecasting and financial prediction systems.However,missing data frequently occur during data recording,posing a great challenge to data mining tasks.In this study,we propose a novel time series data representation-based denoising autoencoder(DAE)for the reconstruction of missing values.Two data representation methods,namely,recurrence plot(RP)and Gramian angular field(GAF),are used to transform the raw time series to a 2D matrix for establishing the temporal correlations between different time intervals and extracting the structural patterns from the time series.Then an improved DAE is proposed to reconstruct the missing values from the 2D representation of time series.A comprehensive comparison is conducted amongst the different representations on standard datasets.Results show that the 2D representations have a lower reconstruction error than the raw time series,and the RP representation provides the best outcome.This work provides useful insights into the better reconstruction of missing values in time series analysis to considerably improve the reliability of timevarying system.展开更多
Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been prop...Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been proposed to deal with the abnormal data,they generally detect and/or repair all abnormal data without further differentiate.Actually,besides the abnormal data caused by events,it is well known that sensor nodes prone to generate abnormal data due to factors such as sensor hardware drawbacks and random effects of external sources.Dealing with all abnormal data without differentiate will result in false detection or missed detection of the events.In this paper,we propose a data cleaning approach based on Stacked Denoising Autoencoders(SDAE)and multi-sensor collaborations.We detect all abnormal data by SDAE,then differentiate the abnormal data by multi-sensor collaborations.The abnormal data caused by events are unchanged,while the abnormal data caused by other factors are repaired.Real data based simulations show the efficiency of the proposed approach.展开更多
Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. Howe...Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. However, due to the traditional ELM shallow architecture, it requires a large number of hidden nodes when dealing with high-dimensional data sets to ensure its classification performance. The other aspect, it is easy to degrade the classification performance in the face of noise interference from noisy data. To improve the above problem, this paper proposes a double pseudo-inverse extreme learning machine (DPELM) based on Sparse Denoising AutoEncoder (SDAE) namely, SDAE-DPELM. The algorithm can directly determine the input weight and output weight of the network by using the pseudo-inverse method. As a result, the algorithm only requires a few hidden layer nodes to produce superior classification results when classifying data. And its combination with SDAE can effectively improve the classification performance and noise resistance. Extensive numerical experiments show that the algorithm has high classification accuracy and good robustness when dealing with high-dimensional noisy data and high-dimensional noiseless data. Furthermore, applying such an algorithm to Miao character recognition substantiates its excellent performance, which further illustrates the practicability of the algorithm.展开更多
Object detection,one of the core research topics in computer vision,is extensively used in various industrial activities.Although there have been many studies of daytime images where objects can be easily detected,the...Object detection,one of the core research topics in computer vision,is extensively used in various industrial activities.Although there have been many studies of daytime images where objects can be easily detected,there is relatively little research on nighttime images.In the case of nighttime,various types of noises,such as darkness,haze,and light blur,deteriorate image quality.Thus,an appropriate process for removing noise must precede to improve object detection performance.Although there are many studies on removing individual noise,only a few studies handle multiple noises simultaneously.In this paper,we pro-pose a convolutional denoising autoencoder(CDAE)-based architecture trained on various types of noises.We also present various composing modules for each noise to improve object detection performance for night images.Using the exclusively dark(ExDark)Image dataset,experimental results show that the Sequentialfiltering architecture showed superior mean average precision(mAP)compared to other architectures.展开更多
Effective clinical trials are necessary for understanding medical advances but early termination of trials can result in unnecessary waste of resources.Survival models can be used to predict survival probabilities in ...Effective clinical trials are necessary for understanding medical advances but early termination of trials can result in unnecessary waste of resources.Survival models can be used to predict survival probabilities in such trials.However,survival data from clinical trials are sparse,and DeepSurv cannot accurately capture their effective features,making the models weak in generalization and decreasing their prediction accuracy.In this paper,we propose a survival prediction model for clinical trial completion based on the combination of denoising autoencoder(DAE)and DeepSurv models.The DAE is used to obtain a robust representation of features by breaking the loop of raw features after autoencoder training,and then the robust features are provided to DeepSurv as input for training.The clinical trial dataset for training the model was obtained from the ClinicalTrials.gov dataset.A study of clinical trial completion in pregnant women was conducted in response to the fact that many current clinical trials exclude pregnant women.The experimental results showed that the denoising autoencoder and deep survival regression(DAE-DSR)model was able to extract meaningful and robust features for survival analysis;the C-index of the training and test datasets were 0.74 and 0.75 respectively.Compared with the Cox proportional hazards model and DeepSurv model,the survival analysis curves obtained by using DAE-DSR model had more prominent features,and the model was more robust and performed better in actual prediction.展开更多
Software defect prediction plays a very important role in software quality assurance,which aims to inspect as many potentially defect-prone software modules as possible.However,the performance of the prediction model ...Software defect prediction plays a very important role in software quality assurance,which aims to inspect as many potentially defect-prone software modules as possible.However,the performance of the prediction model is susceptible to high dimensionality of the dataset that contains irrelevant and redundant features.In addition,software metrics for software defect prediction are almost entirely traditional features compared to the deep semantic feature representation from deep learning techniques.To address these two issues,we propose the following two solutions in this paper:(1)We leverage a novel non-linear manifold learning method-SOINN Landmark Isomap(SL-Isomap)to extract the representative features by selecting automatically the reasonable number and position of landmarks,which can reveal the complex intrinsic structure hidden behind the defect data.(2)We propose a novel defect prediction model named DLDD based on hybrid deep learning techniques,which leverages denoising autoencoder to learn true input features that are not contaminated by noise,and utilizes deep neural network to learn the abstract deep semantic features.We combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to achieve the best prediction performance by adjusting a hyperparameter.We compare the SL-Isomap with seven state-of-the-art feature extraction methods and compare the DLDD model with six baseline models across 20 open source software projects.The experimental results verify that the superiority of SL-Isomap and DLDD on four evaluation indicators.展开更多
In recent years,there are numerous works been proposed to leverage the techniques of deep learning to improve social-aware recommendation performance.In most cases,it requires a larger number of data to train a robust...In recent years,there are numerous works been proposed to leverage the techniques of deep learning to improve social-aware recommendation performance.In most cases,it requires a larger number of data to train a robust deep learning model,which contains a lot of parameters to fit training data.However,both data of user ratings and social networks are facing critical sparse problem,which makes it not easy to train a robust deep neural network model.Towards this problem,we propose a novel correlative denoising autoencoder(CoDAE)method by taking correlations between users with multiple roles into account to learn robust representations from sparse inputs of ratings and social networks for recommendation.We develop the CoDAE model by utilizing three separated autoencoders to learn user features with roles of rater,truster and trustee,respectively.Especially,on account of that each input unit of user vectors with roles of truster and trustee is corresponding to a particular user,we propose to utilize shared parameters to learn common information of the units that corresponding to same users.Moreover,we propose a related regularization term to learn correlations between user features that learnt by the three subnetworks of CoDAE model.We further conduct a series of experiments to evaluate the proposed method on two public datasets for Top-N recommendation task.The experimental results demonstrate that the proposed model outperforms state-of-the-art algorithms on rank-sensitive metrics of MAP and NDCG.展开更多
Precise biomarker development is a key step in disease management. However, most of the published biomarkers were derived from a relatively small number of samples with supervised approaches. Recent advances in unsupe...Precise biomarker development is a key step in disease management. However, most of the published biomarkers were derived from a relatively small number of samples with supervised approaches. Recent advances in unsupervised machine learning promise to leverage very large datasets for making better predictions of disease biomarkers. Denoising autoencoder(DA) is one of the unsupervised deep learning algorithms, which is a stochastic version of autoencoder techniques. The principle of DA is to force the hidden layer of autoencoder to capture more robust features by reconstructing a clean input from a corrupted one. Here, a DA model was applied to analyze integrated transcriptomic data from 13 published lung cancer studies, which consisted of 1916 human lung tissue samples. Using DA, we discovered a molecular signature composed of multiple genes for lung adenocarcinoma(ADC). In independent validation cohorts, the proposed molecular signature is proved to be an effective classifier for lung cancer histological subtypes. Also, this signature successfully predicts clinical outcome in lung ADC, which is independent of traditional prognostic factors. More importantly, this signature exhibits a superior prognostic power compared with the other published prognostic genes. Our study suggests that unsupervised learning is helpful for biomarker development in the era of precision medicine.展开更多
Accurate fault prediction is essential to ensure the safety and reliability of combine harvester operation.In this study,a combine harvester fault prediction method based on a combination of stacked denoising autoenco...Accurate fault prediction is essential to ensure the safety and reliability of combine harvester operation.In this study,a combine harvester fault prediction method based on a combination of stacked denoising autoencoders(SDAE)and multi-classification support vector machines(SVM)is proposed to predict combine harvester faults by extracting operational features of key combine components.In general,SDAE contains autoencoders and uses a deep network architecture to learn complex non-linear input-output relationships in a hierarchical manner.Selected features are fed into the SDAE network,deep-level features of the input parameters are extracted by SDAE,and an SVM classifier is then added to its top layer to achieve combine harvester fault prediction.The experimental results show that the method can achieve accurate and efficient combine harvester fault prediction.In particular,the experiments used Gaussian noise with a distribution center of 0.05 to corrupt the test data samples obtained by random sampling of the whole population,and the results showed that the prediction accuracy of the method was 95.31%,which has better robustness and generalization ability compared to SVM(77.03%),BP(74.61%),and SAE(90.86%).展开更多
An optimal configuration method of a multi-energy microgrid system based on the deep joint generation of sourceload-temperature scenarios is proposed to improve the multienergy complementation and the reliability of e...An optimal configuration method of a multi-energy microgrid system based on the deep joint generation of sourceload-temperature scenarios is proposed to improve the multienergy complementation and the reliability of energy supply in extreme scenarios.First,based on the historical meteorological data,the typical meteorological clusters and extreme temperature types are obtained.Then,to reflect the uncertainty of energy consumption and renewable energy output in different weather types,a deep joint generation model using a radiation-electric load-temperature scenario based on a denoising variational autoencoder is established for each weather module.At the same time,to cover the potential high energy consumption scenarios with extreme temperatures,the extreme scenarios with fewer data samples are expanded.Then,the scenarios are reduced by clustering analysis.The normal days of different typical scenarios and extreme temperature scenarios are determined,and the cooling and heating loads are determined by temperature.Finally,the optimal configuration of a multi-energy microgrid system is carried out.Experiments show that the optimal configuration based on the extreme scenarios and typical scenarios can improve the power supply reliability of the system.The proposed method can accurately capture the complementary potential of energy sources.And the economy of the system configuration is improved by 14.56%.展开更多
The cross-section profile is a key signal for evaluating hot-rolled strip quality,and ignoring its defects can easily lead to a final failure.The characteristics of complex curve,significant irregular fluctuation and ...The cross-section profile is a key signal for evaluating hot-rolled strip quality,and ignoring its defects can easily lead to a final failure.The characteristics of complex curve,significant irregular fluctuation and imperfect sample data make it a challenge of recognizing cross-section defects,and current industrial judgment methods rely excessively on human decision making.A novel stacked denoising autoencoders(SDAE)model optimized with support vector machine(SVM)theory was proposed for the recognition of cross-section defects.Firstly,interpolation filtering and principal component analysis were employed to linearly reduce the data dimensionality of the profile curve.Secondly,the deep learning algorithm SDAE was used layer by layer for greedy unsupervised feature learning,and its final layer of back-propagation neural network was replaced by SVM for supervised learning of the final features,and the final model SDAE_SVM was obtained by further optimizing the entire network parameters via error back-propagation.Finally,the curve mirroring and combination stitching methods were used as data augmentation for the training set,which dealt with the problem of sample imbalance in the original data set,and the accuracy of cross-section defect prediction was further improved.The approach was applied in a 1780-mm hot rolling line of a steel mill to achieve the automatic diagnosis and classification of defects in cross-section profile of hot-rolled strip,which helps to reduce flatness quality concerns in downstream processes.展开更多
With the ever-growing dynamicity, complexity, technique is proposed and becomes one of the most effective and volume of information resources, the recommendation techniques for solving the so-called problem of informa...With the ever-growing dynamicity, complexity, technique is proposed and becomes one of the most effective and volume of information resources, the recommendation techniques for solving the so-called problem of information overload. Traditional recommendation algorithms, such as collaborative filtering based on the user or item, only measure the degree of similarity between users or items with single criterion, i.e., ratings. According to the experience of previous studies, single criterion cannot accurately measure the similarity between user preferences or items. In recent years, the application of deep learning techniques has gained significant momentum in recommender systems for better understanding of user preferences, item characteristics, and historical interactions. In this work, we integrate plot information as auxiliary information into the denoising autoencoder (DAE), called SemRe-DCF, which aims at learning semantic representations of item descriptions and succeeds in capturing fine-grained semantic regularities by using vector arithmetic to get better rating prediction. The results manifest that the proposed method can effectively improve the accuracy of prediction and solve the cold start problem.展开更多
Purpose-The aim of this study is to propose a deep neural network(DNN)method that uses side information to improve clustering results for big datasets;also,the authors show that applying this information improves the ...Purpose-The aim of this study is to propose a deep neural network(DNN)method that uses side information to improve clustering results for big datasets;also,the authors show that applying this information improves the performance of clustering and also increase the speed of the network training convergence.Design/methodology/approach-In data mining,semisupervised learning is an interesting approach because good performance can be achieved with a small subset of labeled data;one reason is that the data labeling is expensive,and semisupervised learning does not need all labels.One type of semisupervised learning is constrained clustering;this type of learning does not use class labels for clustering.Instead,it uses information of some pairs of instances(side information),and these instances maybe are in the same cluster(must-link[ML])or in different clusters(cannot-link[CL]).Constrained clustering was studied extensively;however,little works have focused on constrained clustering for big datasets.In this paper,the authors have presented a constrained clustering for big datasets,and the method uses a DNN.The authors inject the constraints(ML and CL)to this DNN to promote the clustering performance and call it constrained deep embedded clustering(CDEC).In this manner,an autoencoder was implemented to elicit informative low dimensional features in the latent space and then retrain the encoder network using a proposed Kullback-Leibler divergence objective function,which captures the constraints in order to cluster the projected samples.The proposed CDEC has been compared with the adversarial autoencoder,constrained 1-spectral clustering and autoencoder t k-means was applied to the known MNIST,Reuters-10k and USPS datasets,and their performance were assessed in terms of clustering accuracy.Empirical results confirmed the statistical superiority of CDEC in terms of clustering accuracy to the counterparts.Findings-First of all,this is the first DNN-constrained clustering that uses side information to improve the performance of clustering without using labels in big datasets with high dimension.Second,the author defined a formula to inject side information to the DNN.Third,the proposed method improves clustering performance and network convergence speed.Originality/value-Little works have focused on constrained clustering for big datasets;also,the studies in DNNs for clustering,with specific loss function that simultaneously extract features and clustering the data,are rare.The method improves the performance of big data clustering without using labels,and it is important because the data labeling is expensive and time-consuming,especially for big datasets.展开更多
Guaranteeing the safety of equipment is extremely important in industry.To improve reliability and availability of equipment,various methods for prognostics and health management(PHM)have been proposed.Predicting rema...Guaranteeing the safety of equipment is extremely important in industry.To improve reliability and availability of equipment,various methods for prognostics and health management(PHM)have been proposed.Predicting remaining useful life(RUL)of industrial equipment is a key aspect of PHM and it is always one of the most challenging issues.With the rapid development of industrial equipment and sensing technology,an increasing amount of data on the health level of equipment can be obtained for RUL prediction.This paper proposes a hybrid data-driven approach based on stacked denoising autoencode(SDAE)and similarity theory for estimating remaining useful life of industrial equipment,which is named RULESS.Our work is making the most of stacked SDAE and similarity theory to improve the accuracy of RUL prediction.The effectiveness of the proposed approach was evaluated by using aircraft engine health data simulated by commercial modular Aero-Propulsion system simulation(C-MAPSS).展开更多
基金National Natural Science Foundation of China (Project No. 61273365)111 Project (No. B08004) are gratefully acknowledged
文摘Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.
文摘Invoice document digitization is crucial for efficient management in industries.The scanned invoice image is often noisy due to various reasons.This affects the OCR(optical character recognition)detection accuracy.In this paper,letter data obtained from images of invoices are denoised using a modified autoencoder based deep learning method.A stacked denoising autoencoder(SDAE)is implemented with two hidden layers each in encoder network and decoder network.In order to capture the most salient features of training samples,a undercomplete autoencoder is designed with non-linear encoder and decoder function.This autoencoder is regularized for denoising application using a combined loss function which considers both mean square error and binary cross entropy.A dataset consisting of 59,119 letter images,which contains both English alphabets(upper and lower case)and numbers(0 to 9)is prepared from many scanned invoices images and windows true type(.ttf)files,are used for training the neural network.Performance is analyzed in terms of Signal to Noise Ratio(SNR),Peak Signal to Noise Ratio(PSNR),Structural Similarity Index(SSIM)and Universal Image Quality Index(UQI)and compared with other filtering techniques like Nonlocal Means filter,Anisotropic diffusion filter,Gaussian filters and Mean filters.Denoising performance of proposed SDAE is compared with existing SDAE with single loss function in terms of SNR and PSNR values.Results show the superior performance of proposed SDAE method.
文摘Time series analysis is a key technology for medical diagnosis,weather forecasting and financial prediction systems.However,missing data frequently occur during data recording,posing a great challenge to data mining tasks.In this study,we propose a novel time series data representation-based denoising autoencoder(DAE)for the reconstruction of missing values.Two data representation methods,namely,recurrence plot(RP)and Gramian angular field(GAF),are used to transform the raw time series to a 2D matrix for establishing the temporal correlations between different time intervals and extracting the structural patterns from the time series.Then an improved DAE is proposed to reconstruct the missing values from the 2D representation of time series.A comprehensive comparison is conducted amongst the different representations on standard datasets.Results show that the 2D representations have a lower reconstruction error than the raw time series,and the RP representation provides the best outcome.This work provides useful insights into the better reconstruction of missing values in time series analysis to considerably improve the reliability of timevarying system.
基金This work is supported by the National Natural Science Foundation of China(Grant No.61672282)the Basic Research Program of Jiangsu Province(Grant No.BK20161491).
文摘Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been proposed to deal with the abnormal data,they generally detect and/or repair all abnormal data without further differentiate.Actually,besides the abnormal data caused by events,it is well known that sensor nodes prone to generate abnormal data due to factors such as sensor hardware drawbacks and random effects of external sources.Dealing with all abnormal data without differentiate will result in false detection or missed detection of the events.In this paper,we propose a data cleaning approach based on Stacked Denoising Autoencoders(SDAE)and multi-sensor collaborations.We detect all abnormal data by SDAE,then differentiate the abnormal data by multi-sensor collaborations.The abnormal data caused by events are unchanged,while the abnormal data caused by other factors are repaired.Real data based simulations show the efficiency of the proposed approach.
文摘Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. However, due to the traditional ELM shallow architecture, it requires a large number of hidden nodes when dealing with high-dimensional data sets to ensure its classification performance. The other aspect, it is easy to degrade the classification performance in the face of noise interference from noisy data. To improve the above problem, this paper proposes a double pseudo-inverse extreme learning machine (DPELM) based on Sparse Denoising AutoEncoder (SDAE) namely, SDAE-DPELM. The algorithm can directly determine the input weight and output weight of the network by using the pseudo-inverse method. As a result, the algorithm only requires a few hidden layer nodes to produce superior classification results when classifying data. And its combination with SDAE can effectively improve the classification performance and noise resistance. Extensive numerical experiments show that the algorithm has high classification accuracy and good robustness when dealing with high-dimensional noisy data and high-dimensional noiseless data. Furthermore, applying such an algorithm to Miao character recognition substantiates its excellent performance, which further illustrates the practicability of the algorithm.
基金supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea(NRF-2021S1A5A2A01061459).
文摘Object detection,one of the core research topics in computer vision,is extensively used in various industrial activities.Although there have been many studies of daytime images where objects can be easily detected,there is relatively little research on nighttime images.In the case of nighttime,various types of noises,such as darkness,haze,and light blur,deteriorate image quality.Thus,an appropriate process for removing noise must precede to improve object detection performance.Although there are many studies on removing individual noise,only a few studies handle multiple noises simultaneously.In this paper,we pro-pose a convolutional denoising autoencoder(CDAE)-based architecture trained on various types of noises.We also present various composing modules for each noise to improve object detection performance for night images.Using the exclusively dark(ExDark)Image dataset,experimental results show that the Sequentialfiltering architecture showed superior mean average precision(mAP)compared to other architectures.
基金Natural Science Foundation of Hunan Province,Grant/Award Number:2022JJ30755。
文摘Effective clinical trials are necessary for understanding medical advances but early termination of trials can result in unnecessary waste of resources.Survival models can be used to predict survival probabilities in such trials.However,survival data from clinical trials are sparse,and DeepSurv cannot accurately capture their effective features,making the models weak in generalization and decreasing their prediction accuracy.In this paper,we propose a survival prediction model for clinical trial completion based on the combination of denoising autoencoder(DAE)and DeepSurv models.The DAE is used to obtain a robust representation of features by breaking the loop of raw features after autoencoder training,and then the robust features are provided to DeepSurv as input for training.The clinical trial dataset for training the model was obtained from the ClinicalTrials.gov dataset.A study of clinical trial completion in pregnant women was conducted in response to the fact that many current clinical trials exclude pregnant women.The experimental results showed that the denoising autoencoder and deep survival regression(DAE-DSR)model was able to extract meaningful and robust features for survival analysis;the C-index of the training and test datasets were 0.74 and 0.75 respectively.Compared with the Cox proportional hazards model and DeepSurv model,the survival analysis curves obtained by using DAE-DSR model had more prominent features,and the model was more robust and performed better in actual prediction.
基金This work is supported in part by the National Science Foundation of China(Grant Nos.61672392,61373038)in part by the National Key Research and Development Program of China(Grant No.2016YFC1202204).
文摘Software defect prediction plays a very important role in software quality assurance,which aims to inspect as many potentially defect-prone software modules as possible.However,the performance of the prediction model is susceptible to high dimensionality of the dataset that contains irrelevant and redundant features.In addition,software metrics for software defect prediction are almost entirely traditional features compared to the deep semantic feature representation from deep learning techniques.To address these two issues,we propose the following two solutions in this paper:(1)We leverage a novel non-linear manifold learning method-SOINN Landmark Isomap(SL-Isomap)to extract the representative features by selecting automatically the reasonable number and position of landmarks,which can reveal the complex intrinsic structure hidden behind the defect data.(2)We propose a novel defect prediction model named DLDD based on hybrid deep learning techniques,which leverages denoising autoencoder to learn true input features that are not contaminated by noise,and utilizes deep neural network to learn the abstract deep semantic features.We combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to achieve the best prediction performance by adjusting a hyperparameter.We compare the SL-Isomap with seven state-of-the-art feature extraction methods and compare the DLDD model with six baseline models across 20 open source software projects.The experimental results verify that the superiority of SL-Isomap and DLDD on four evaluation indicators.
基金supported by the National Natural Science Foundation of China(Grant No.61472289)the National Key Research and Development Project(2016YFC0106305).
文摘In recent years,there are numerous works been proposed to leverage the techniques of deep learning to improve social-aware recommendation performance.In most cases,it requires a larger number of data to train a robust deep learning model,which contains a lot of parameters to fit training data.However,both data of user ratings and social networks are facing critical sparse problem,which makes it not easy to train a robust deep neural network model.Towards this problem,we propose a novel correlative denoising autoencoder(CoDAE)method by taking correlations between users with multiple roles into account to learn robust representations from sparse inputs of ratings and social networks for recommendation.We develop the CoDAE model by utilizing three separated autoencoders to learn user features with roles of rater,truster and trustee,respectively.Especially,on account of that each input unit of user vectors with roles of truster and trustee is corresponding to a particular user,we propose to utilize shared parameters to learn common information of the units that corresponding to same users.Moreover,we propose a related regularization term to learn correlations between user features that learnt by the three subnetworks of CoDAE model.We further conduct a series of experiments to evaluate the proposed method on two public datasets for Top-N recommendation task.The experimental results demonstrate that the proposed model outperforms state-of-the-art algorithms on rank-sensitive metrics of MAP and NDCG.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.61372164 to XX,61471112 to WG,and 61571109 to WG)the Key R&D Program of Jiangsu Province,China(Grant No.BE2016002-3 to WG)+2 种基金the Fundamental Research Funds for the Central Universities,China(Grant No.2242017K3DN04 to WG)the Clinical Research Cultivation Program,China(Grant No.2017CX010 to LC)the Social Development Foundation of Jiangsu Province–Clinical Frontier Technology,China(Grant No.BE2018746 to LC)
文摘Precise biomarker development is a key step in disease management. However, most of the published biomarkers were derived from a relatively small number of samples with supervised approaches. Recent advances in unsupervised machine learning promise to leverage very large datasets for making better predictions of disease biomarkers. Denoising autoencoder(DA) is one of the unsupervised deep learning algorithms, which is a stochastic version of autoencoder techniques. The principle of DA is to force the hidden layer of autoencoder to capture more robust features by reconstructing a clean input from a corrupted one. Here, a DA model was applied to analyze integrated transcriptomic data from 13 published lung cancer studies, which consisted of 1916 human lung tissue samples. Using DA, we discovered a molecular signature composed of multiple genes for lung adenocarcinoma(ADC). In independent validation cohorts, the proposed molecular signature is proved to be an effective classifier for lung cancer histological subtypes. Also, this signature successfully predicts clinical outcome in lung ADC, which is independent of traditional prognostic factors. More importantly, this signature exhibits a superior prognostic power compared with the other published prognostic genes. Our study suggests that unsupervised learning is helpful for biomarker development in the era of precision medicine.
基金The work was sponsored by the Intelligent Manufacturing Comprehensive Standardization Project(No.2018GXZ1101011)the National Key Research and Development Program of China Sub-project(No.2016YFD0701802)the Natural Science Foundation of Henan(No.202300410124).
文摘Accurate fault prediction is essential to ensure the safety and reliability of combine harvester operation.In this study,a combine harvester fault prediction method based on a combination of stacked denoising autoencoders(SDAE)and multi-classification support vector machines(SVM)is proposed to predict combine harvester faults by extracting operational features of key combine components.In general,SDAE contains autoencoders and uses a deep network architecture to learn complex non-linear input-output relationships in a hierarchical manner.Selected features are fed into the SDAE network,deep-level features of the input parameters are extracted by SDAE,and an SVM classifier is then added to its top layer to achieve combine harvester fault prediction.The experimental results show that the method can achieve accurate and efficient combine harvester fault prediction.In particular,the experiments used Gaussian noise with a distribution center of 0.05 to corrupt the test data samples obtained by random sampling of the whole population,and the results showed that the prediction accuracy of the method was 95.31%,which has better robustness and generalization ability compared to SVM(77.03%),BP(74.61%),and SAE(90.86%).
基金supported by National Key Research and Development Program of China(2019YFB1505400)Jilin Science and Technology Development Program(20160411003XH)Jilin Industrial Technology Research and Development Program(2019C058-8).
文摘An optimal configuration method of a multi-energy microgrid system based on the deep joint generation of sourceload-temperature scenarios is proposed to improve the multienergy complementation and the reliability of energy supply in extreme scenarios.First,based on the historical meteorological data,the typical meteorological clusters and extreme temperature types are obtained.Then,to reflect the uncertainty of energy consumption and renewable energy output in different weather types,a deep joint generation model using a radiation-electric load-temperature scenario based on a denoising variational autoencoder is established for each weather module.At the same time,to cover the potential high energy consumption scenarios with extreme temperatures,the extreme scenarios with fewer data samples are expanded.Then,the scenarios are reduced by clustering analysis.The normal days of different typical scenarios and extreme temperature scenarios are determined,and the cooling and heating loads are determined by temperature.Finally,the optimal configuration of a multi-energy microgrid system is carried out.Experiments show that the optimal configuration based on the extreme scenarios and typical scenarios can improve the power supply reliability of the system.The proposed method can accurately capture the complementary potential of energy sources.And the economy of the system configuration is improved by 14.56%.
基金supported by the National Natural Science Foundation of China(No.52004029)the Joint Doctoral Program of China Scholarship Council(CSC)(202006460073)Liuzhou Science and Technology Plan Project,China(2021AAD0102).
文摘The cross-section profile is a key signal for evaluating hot-rolled strip quality,and ignoring its defects can easily lead to a final failure.The characteristics of complex curve,significant irregular fluctuation and imperfect sample data make it a challenge of recognizing cross-section defects,and current industrial judgment methods rely excessively on human decision making.A novel stacked denoising autoencoders(SDAE)model optimized with support vector machine(SVM)theory was proposed for the recognition of cross-section defects.Firstly,interpolation filtering and principal component analysis were employed to linearly reduce the data dimensionality of the profile curve.Secondly,the deep learning algorithm SDAE was used layer by layer for greedy unsupervised feature learning,and its final layer of back-propagation neural network was replaced by SVM for supervised learning of the final features,and the final model SDAE_SVM was obtained by further optimizing the entire network parameters via error back-propagation.Finally,the curve mirroring and combination stitching methods were used as data augmentation for the training set,which dealt with the problem of sample imbalance in the original data set,and the accuracy of cross-section defect prediction was further improved.The approach was applied in a 1780-mm hot rolling line of a steel mill to achieve the automatic diagnosis and classification of defects in cross-section profile of hot-rolled strip,which helps to reduce flatness quality concerns in downstream processes.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 71473035 and 11501095, the Fundamental Research Funds for the Central Universities of China under Grant No. 2412017QD028, the China Postdoctoral Science Foundation under Grant No. 2017M021192, the Scientific and Technological Development Program of Jilin Province of China under Grant Nos. 20180520022JH, 20150204040GX, and 20170520051JH, Jilin Province Development and Reform Commission Project of China under Grant Nos. 2015Y055 and 2015Y054, and the Natural Science Foundation of Jilin Province of China under Grant No. 20150101057JC.
文摘With the ever-growing dynamicity, complexity, technique is proposed and becomes one of the most effective and volume of information resources, the recommendation techniques for solving the so-called problem of information overload. Traditional recommendation algorithms, such as collaborative filtering based on the user or item, only measure the degree of similarity between users or items with single criterion, i.e., ratings. According to the experience of previous studies, single criterion cannot accurately measure the similarity between user preferences or items. In recent years, the application of deep learning techniques has gained significant momentum in recommender systems for better understanding of user preferences, item characteristics, and historical interactions. In this work, we integrate plot information as auxiliary information into the denoising autoencoder (DAE), called SemRe-DCF, which aims at learning semantic representations of item descriptions and succeeds in capturing fine-grained semantic regularities by using vector arithmetic to get better rating prediction. The results manifest that the proposed method can effectively improve the accuracy of prediction and solve the cold start problem.
文摘Purpose-The aim of this study is to propose a deep neural network(DNN)method that uses side information to improve clustering results for big datasets;also,the authors show that applying this information improves the performance of clustering and also increase the speed of the network training convergence.Design/methodology/approach-In data mining,semisupervised learning is an interesting approach because good performance can be achieved with a small subset of labeled data;one reason is that the data labeling is expensive,and semisupervised learning does not need all labels.One type of semisupervised learning is constrained clustering;this type of learning does not use class labels for clustering.Instead,it uses information of some pairs of instances(side information),and these instances maybe are in the same cluster(must-link[ML])or in different clusters(cannot-link[CL]).Constrained clustering was studied extensively;however,little works have focused on constrained clustering for big datasets.In this paper,the authors have presented a constrained clustering for big datasets,and the method uses a DNN.The authors inject the constraints(ML and CL)to this DNN to promote the clustering performance and call it constrained deep embedded clustering(CDEC).In this manner,an autoencoder was implemented to elicit informative low dimensional features in the latent space and then retrain the encoder network using a proposed Kullback-Leibler divergence objective function,which captures the constraints in order to cluster the projected samples.The proposed CDEC has been compared with the adversarial autoencoder,constrained 1-spectral clustering and autoencoder t k-means was applied to the known MNIST,Reuters-10k and USPS datasets,and their performance were assessed in terms of clustering accuracy.Empirical results confirmed the statistical superiority of CDEC in terms of clustering accuracy to the counterparts.Findings-First of all,this is the first DNN-constrained clustering that uses side information to improve the performance of clustering without using labels in big datasets with high dimension.Second,the author defined a formula to inject side information to the DNN.Third,the proposed method improves clustering performance and network convergence speed.Originality/value-Little works have focused on constrained clustering for big datasets;also,the studies in DNNs for clustering,with specific loss function that simultaneously extract features and clustering the data,are rare.The method improves the performance of big data clustering without using labels,and it is important because the data labeling is expensive and time-consuming,especially for big datasets.
基金the National Key Research and DevelopmentProjectof China (No. 2018YFB1702600, 2018YFB1702602)National Natural Science Foundationof China (No. 61402167, 61772193, 61872139)+1 种基金Hunan Provincial Natural ScienceFoundation of China (No. 2017JJ4036, 2018JJ2139)Research Foundation of HunanProvincial Education Department of China (No.17K033, 19A174).
文摘Guaranteeing the safety of equipment is extremely important in industry.To improve reliability and availability of equipment,various methods for prognostics and health management(PHM)have been proposed.Predicting remaining useful life(RUL)of industrial equipment is a key aspect of PHM and it is always one of the most challenging issues.With the rapid development of industrial equipment and sensing technology,an increasing amount of data on the health level of equipment can be obtained for RUL prediction.This paper proposes a hybrid data-driven approach based on stacked denoising autoencode(SDAE)and similarity theory for estimating remaining useful life of industrial equipment,which is named RULESS.Our work is making the most of stacked SDAE and similarity theory to improve the accuracy of RUL prediction.The effectiveness of the proposed approach was evaluated by using aircraft engine health data simulated by commercial modular Aero-Propulsion system simulation(C-MAPSS).