Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. Howe...Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. However, due to the traditional ELM shallow architecture, it requires a large number of hidden nodes when dealing with high-dimensional data sets to ensure its classification performance. The other aspect, it is easy to degrade the classification performance in the face of noise interference from noisy data. To improve the above problem, this paper proposes a double pseudo-inverse extreme learning machine (DPELM) based on Sparse Denoising AutoEncoder (SDAE) namely, SDAE-DPELM. The algorithm can directly determine the input weight and output weight of the network by using the pseudo-inverse method. As a result, the algorithm only requires a few hidden layer nodes to produce superior classification results when classifying data. And its combination with SDAE can effectively improve the classification performance and noise resistance. Extensive numerical experiments show that the algorithm has high classification accuracy and good robustness when dealing with high-dimensional noisy data and high-dimensional noiseless data. Furthermore, applying such an algorithm to Miao character recognition substantiates its excellent performance, which further illustrates the practicability of the algorithm.展开更多
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse mult...Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.展开更多
贝叶斯概率矩阵分解方法因较高的预测准确度和良好的可扩展性,常用于个性化推荐系统,但其推荐精度会受初始评分矩阵稀疏特性的影响.提出一种基于广义高斯分布的贝叶斯概率矩阵分解方法GBPMF(generalized Gaussian distribution Bayesian...贝叶斯概率矩阵分解方法因较高的预测准确度和良好的可扩展性,常用于个性化推荐系统,但其推荐精度会受初始评分矩阵稀疏特性的影响.提出一种基于广义高斯分布的贝叶斯概率矩阵分解方法GBPMF(generalized Gaussian distribution Bayesian PMF),采用广义高斯分布作为先验分布,通过机器学习自动选择最优的模型参数,并基于Gibbs采样进行高效训练,从而有效缓解矩阵的稀疏性,减小预测误差.同时考虑到评分时差因素对预测过程的影响,在采样算法中添加时间因子,进一步对方法进行优化,提高预测精度.实验结果表明:GBPMF方法及其优化方法 GBPMF-T对非稀疏矩阵和稀疏矩阵均具有较高的精度,后者精度更高.当矩阵非常稀疏时,传统贝叶斯概率矩阵分解方法的精度急剧降低,而该方法则具有较好的稳定性.展开更多
文摘Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. However, due to the traditional ELM shallow architecture, it requires a large number of hidden nodes when dealing with high-dimensional data sets to ensure its classification performance. The other aspect, it is easy to degrade the classification performance in the face of noise interference from noisy data. To improve the above problem, this paper proposes a double pseudo-inverse extreme learning machine (DPELM) based on Sparse Denoising AutoEncoder (SDAE) namely, SDAE-DPELM. The algorithm can directly determine the input weight and output weight of the network by using the pseudo-inverse method. As a result, the algorithm only requires a few hidden layer nodes to produce superior classification results when classifying data. And its combination with SDAE can effectively improve the classification performance and noise resistance. Extensive numerical experiments show that the algorithm has high classification accuracy and good robustness when dealing with high-dimensional noisy data and high-dimensional noiseless data. Furthermore, applying such an algorithm to Miao character recognition substantiates its excellent performance, which further illustrates the practicability of the algorithm.
基金supported by the National Key R&D Program of China(Project No.2016YFC0800200)the NRF-NSFC 3rd Joint Research Grant(Earth Science)(Project No.41861144022)+2 种基金the National Natural Science Foundation of China(Project Nos.51679174,and 51779189)the Shenzhen Key Technology R&D Program(Project No.20170324)The financial support is grateful acknowledged。
文摘Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.