In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring...In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.展开更多
The motivation for this article is to propose new damage classifiers based on a supervised learning problem for locating and quantifying damage.A new feature extraction approach using time series analysis is introduce...The motivation for this article is to propose new damage classifiers based on a supervised learning problem for locating and quantifying damage.A new feature extraction approach using time series analysis is introduced to extract damage-sensitive features from auto-regressive models.This approach sets out to improve current feature extraction techniques in the context of time series modeling.The coefficients and residuals of the AR model obtained from the proposed approach are selected as the main features and are applied to the proposed supervised learning classifiers that are categorized as coefficient-based and residual-based classifiers.These classifiers compute the relative errors in the extracted features between the undamaged and damaged states.Eventually,the abilities of the proposed methods to localize and quantify single and multiple damage scenarios are verified by applying experimental data for a laboratory frame and a four-story steel structure.Comparative analyses are performed to validate the superiority of the proposed methods over some existing techniques.Results show that the proposed classifiers,with the aid of extracted features from the proposed feature extraction approach,are able to locate and quantify damage;however,the residual-based classifiers yield better results than the coefficient-based classifiers.Moreover,these methods are superior to some classical techniques.展开更多
Semi-supervised dimensionality reduction is an important research area for data classification. A new linear dimensionality reduction approach, global inference preserving projection (GIPP), was proposed to perform ...Semi-supervised dimensionality reduction is an important research area for data classification. A new linear dimensionality reduction approach, global inference preserving projection (GIPP), was proposed to perform classification task in semi-supervised case. GIPP provided a global structure that utilized the underlying discriminative knowledge of unlabeled samples. It used path-based dissimilarity measurement to infer the class label information for unlabeled samples and transformd the diseriminant algorithm into a generalized eigenequation problem. Experimental results demonstrate the effectiveness of the proposed approach.展开更多
In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploit...In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images.展开更多
The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary ...The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary application of convolution neural network in spectral analysis demonstrates excellent end-to-end prediction ability,but it is sensitive to the hyper-parameters of the network.The transformer is a deep-learning model based on self-attention mechanism that compares convolutional neural networks(CNNs)in predictive performance and has an easy-todesign model structure.Hence,a novel calibration model named SpectraTr,based on the transformer structure,is proposed and used for the qualitative analysis of drug spectrum.The experimental results of seven classes of drug and 18 classes of drug show that the proposed SpectraTr model can automatically extract features from a huge number of spectra,is not dependent on pre-processing algorithms,and is insensitive to model hyperparameters.When the ratio of the training set to test set is 8:2,the prediction accuracy of the SpectraTr model reaches 100%and 99.52%,respectively,which outperforms PLS DA,SVM,SAE,and CNN.The model is also tested on a public drug data set,and achieved classification accuracy of 96.97%without preprocessing algorithm,which is 34.85%,28.28%,5.05%,and 2.73%higher than PLS DA,SVM,SAE,and CNN,respectively.The research shows that the SpectraTr model performs exceptionally well in spectral analysis and is expected to be a novel deep calibration model after Autoencoder networks(AEs)and CNN.展开更多
Discussion forums are an indispensable interactive component for Massive Open Online Courses(MOOC).However,the organization of current discussion forums is not well-designed.Trouble-shooting threads are valuable for b...Discussion forums are an indispensable interactive component for Massive Open Online Courses(MOOC).However,the organization of current discussion forums is not well-designed.Trouble-shooting threads are valuable for both learners and instructors,but they are drowned out in the forums with huge amounts of threads.This work first built a labeled data set for trouble-shooting thread structure prediction by crowdsourcing and then proposed methods for trouble-shooting thread detection and thread structure prediction on the data set.The output of this work can be used to spot trouble-shooting threads and show them along with structure tags in MOOC discussion forums.展开更多
Seismic impedance inversion is an important technique for structure identification and reservoir prediction.Model-based and data-driven impedance inversion are the commonly used inversion methods.In practice,the geoph...Seismic impedance inversion is an important technique for structure identification and reservoir prediction.Model-based and data-driven impedance inversion are the commonly used inversion methods.In practice,the geophysical inversion problem is essentially an ill-posedness problem,which means that there are many solutions corresponding to the same seismic data.Therefore,regularization schemes,which can provide stable and unique inversion results to some extent,have been introduced into the objective function as constrain terms.Among them,given a low-frequency initial impedance model is the most commonly used regularization method,which can provide a smooth and stable solution.However,this model-based inversion method relies heavily on the initial model and the inversion result is band limited to the effective frequency bandwidth of seismic data,which cannot effectively improve the seismic vertical resolution and is difficult to be applied to complex structural regions.Therefore,we propose a data-driven approach for high-resolution impedance inversion based on the bidirectional long short-term memory recurrent neural network,which regards seismic data as time-series rather than image-like patches.Compared with the model-based inversion method,the data-driven approach provides higher resolution inversion results,which demonstrates the effectiveness of the data-driven method for recovering the high-frequency components.However,judging from the inversion results for characterization the spatial distribution of thin-layer sands,the accuracy of high-frequency components is difficult to guarantee.Therefore,we add the model constraint to the objective function to overcome the shortages of relying only on the data-driven schemes.First,constructing the supervisor1 based on the bidirectional long short-term memory recurrent neural network,which provides the predicted impedance with higher resolution.Then,convolution constraint as supervisor2 is introduced into the objective function to guarantee the reliability and accuracy of the inversion results,which makes the synthetic seismic data obtained from the inversion result consistent with the input data.Finally,we test the proposed scheme based on the synthetic and field seismic data.Compared to model-based and purely data-driven impedance inversion methods,the proposed approach provides more accurate and reliable inversion results while with higher vertical resolution and better spatial continuity.The inversion results accurately characterize the spatial distribution relationship of thin sands.The model tests demonstrate that the model-constrained and data-driven impedance inversion scheme can effectively improve the thin-layer structure characterization based on the seismic data.Moreover,tests on the oil field data indicate the practicality and adaptability of the proposed method.展开更多
Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类...Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。展开更多
基金Supported by the National Natural Science Foundation of China(61273160)the Fundamental Research Funds for the Central Universities(14CX06067A,13CX05021A)
文摘In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.
文摘The motivation for this article is to propose new damage classifiers based on a supervised learning problem for locating and quantifying damage.A new feature extraction approach using time series analysis is introduced to extract damage-sensitive features from auto-regressive models.This approach sets out to improve current feature extraction techniques in the context of time series modeling.The coefficients and residuals of the AR model obtained from the proposed approach are selected as the main features and are applied to the proposed supervised learning classifiers that are categorized as coefficient-based and residual-based classifiers.These classifiers compute the relative errors in the extracted features between the undamaged and damaged states.Eventually,the abilities of the proposed methods to localize and quantify single and multiple damage scenarios are verified by applying experimental data for a laboratory frame and a four-story steel structure.Comparative analyses are performed to validate the superiority of the proposed methods over some existing techniques.Results show that the proposed classifiers,with the aid of extracted features from the proposed feature extraction approach,are able to locate and quantify damage;however,the residual-based classifiers yield better results than the coefficient-based classifiers.Moreover,these methods are superior to some classical techniques.
基金National Natural Science Foundations of China (No.61072090,60874113)
文摘Semi-supervised dimensionality reduction is an important research area for data classification. A new linear dimensionality reduction approach, global inference preserving projection (GIPP), was proposed to perform classification task in semi-supervised case. GIPP provided a global structure that utilized the underlying discriminative knowledge of unlabeled samples. It used path-based dissimilarity measurement to infer the class label information for unlabeled samples and transformd the diseriminant algorithm into a generalized eigenequation problem. Experimental results demonstrate the effectiveness of the proposed approach.
文摘In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images.
基金supported by the National Natural Science Foundation of China(61906050,21365008)Guangxi Technology R&D Program(2018AD11018)Innovation Project of GUET Graduate Education(2021YCXS050).
文摘The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary application of convolution neural network in spectral analysis demonstrates excellent end-to-end prediction ability,but it is sensitive to the hyper-parameters of the network.The transformer is a deep-learning model based on self-attention mechanism that compares convolutional neural networks(CNNs)in predictive performance and has an easy-todesign model structure.Hence,a novel calibration model named SpectraTr,based on the transformer structure,is proposed and used for the qualitative analysis of drug spectrum.The experimental results of seven classes of drug and 18 classes of drug show that the proposed SpectraTr model can automatically extract features from a huge number of spectra,is not dependent on pre-processing algorithms,and is insensitive to model hyperparameters.When the ratio of the training set to test set is 8:2,the prediction accuracy of the SpectraTr model reaches 100%and 99.52%,respectively,which outperforms PLS DA,SVM,SAE,and CNN.The model is also tested on a public drug data set,and achieved classification accuracy of 96.97%without preprocessing algorithm,which is 34.85%,28.28%,5.05%,and 2.73%higher than PLS DA,SVM,SAE,and CNN,respectively.The research shows that the SpectraTr model performs exceptionally well in spectral analysis and is expected to be a novel deep calibration model after Autoencoder networks(AEs)and CNN.
文摘Discussion forums are an indispensable interactive component for Massive Open Online Courses(MOOC).However,the organization of current discussion forums is not well-designed.Trouble-shooting threads are valuable for both learners and instructors,but they are drowned out in the forums with huge amounts of threads.This work first built a labeled data set for trouble-shooting thread structure prediction by crowdsourcing and then proposed methods for trouble-shooting thread detection and thread structure prediction on the data set.The output of this work can be used to spot trouble-shooting threads and show them along with structure tags in MOOC discussion forums.
基金funded by R&D Department of China National Petroleum Corporation(2022DQ0604-04)the Strategic Cooperation Technology Projects of CNPC and CUPB(ZLZX2020-03)the Science Research and Technology Development of PetroChina(2021DJ1206).
文摘Seismic impedance inversion is an important technique for structure identification and reservoir prediction.Model-based and data-driven impedance inversion are the commonly used inversion methods.In practice,the geophysical inversion problem is essentially an ill-posedness problem,which means that there are many solutions corresponding to the same seismic data.Therefore,regularization schemes,which can provide stable and unique inversion results to some extent,have been introduced into the objective function as constrain terms.Among them,given a low-frequency initial impedance model is the most commonly used regularization method,which can provide a smooth and stable solution.However,this model-based inversion method relies heavily on the initial model and the inversion result is band limited to the effective frequency bandwidth of seismic data,which cannot effectively improve the seismic vertical resolution and is difficult to be applied to complex structural regions.Therefore,we propose a data-driven approach for high-resolution impedance inversion based on the bidirectional long short-term memory recurrent neural network,which regards seismic data as time-series rather than image-like patches.Compared with the model-based inversion method,the data-driven approach provides higher resolution inversion results,which demonstrates the effectiveness of the data-driven method for recovering the high-frequency components.However,judging from the inversion results for characterization the spatial distribution of thin-layer sands,the accuracy of high-frequency components is difficult to guarantee.Therefore,we add the model constraint to the objective function to overcome the shortages of relying only on the data-driven schemes.First,constructing the supervisor1 based on the bidirectional long short-term memory recurrent neural network,which provides the predicted impedance with higher resolution.Then,convolution constraint as supervisor2 is introduced into the objective function to guarantee the reliability and accuracy of the inversion results,which makes the synthetic seismic data obtained from the inversion result consistent with the input data.Finally,we test the proposed scheme based on the synthetic and field seismic data.Compared to model-based and purely data-driven impedance inversion methods,the proposed approach provides more accurate and reliable inversion results while with higher vertical resolution and better spatial continuity.The inversion results accurately characterize the spatial distribution relationship of thin sands.The model tests demonstrate that the model-constrained and data-driven impedance inversion scheme can effectively improve the thin-layer structure characterization based on the seismic data.Moreover,tests on the oil field data indicate the practicality and adaptability of the proposed method.
文摘Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。