As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability throug...As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between "general word" and "special word" in LDA topics.Therefore,we add a constraint to the LDA objective function to let the "general words" only happen in "general topics" other than "special topics".Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.展开更多
This paper proposed a novel feature selection method LUIFS ( latent utility of irrelevant feature selection) that not only selects the relevant features, but also targets at discovering the latent useful irrelevant ...This paper proposed a novel feature selection method LUIFS ( latent utility of irrelevant feature selection) that not only selects the relevant features, but also targets at discovering the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The method minimizes the information lost and simultaneously maximizes the final classification accuracy. The classification error rates of the LUIFS method on 16 real-life datasets from UCI machine learning repository were evaluated using the ID3, Na^ve-Bayes, and IB (instance-based classifier) learning algorithms, respectively; and compared with those of the same algorithms with no feature selection (NoFS), feature subset selection (FSS), and correlation-based feature selection (CFS). The empirical results demonstrate that the LUIFS can improve the performance of learning algorithms by taking the latent relevance for irrelevant attributes into consideration, and hence including those potentially important attributes into the optimal feature subset for classification.展开更多
This paper aims at providing multi-source remote sensing images registered in geometric space for image fusion.Focusing on the characteristics and differences of multi-source remote sensing images,a feature-based regi...This paper aims at providing multi-source remote sensing images registered in geometric space for image fusion.Focusing on the characteristics and differences of multi-source remote sensing images,a feature-based registration algorithm is implemented.The key technologies include image scale-space for implementing multi-scale properties,Harris corner detection for keypoints extraction,and partial intensity invariant feature descriptor(PIIFD)for keypoints description.Eventually,a multi-scale Harris-PIIFD image registration algorithm framework is proposed.The experimental results of fifteen sets of representative real data show that the algorithm has excellent,stable performance in multi-source remote sensing image registration,and can achieve accurate spatial alignment,which has strong practical application value and certain generalization ability.展开更多
Strong mechanical vibration and acoustical signals of grinding process contain useful information related to load parameters in ball mills. It is a challenge to extract latent features and construct soft sensor model ...Strong mechanical vibration and acoustical signals of grinding process contain useful information related to load parameters in ball mills. It is a challenge to extract latent features and construct soft sensor model with high dimensional frequency spectra of these signals. This paper aims to develop a selective ensemble modeling approach based on nonlinear latent frequency spectral feature extraction for accurate measurement of material to ball volume ratio. Latent features are first extracted from different vibrations and acoustic spectral segments by kernel partial least squares. Algorithms of bootstrap and least squares support vector machines are employed to produce candidate sub-models using these latent features as inputs. Ensemble sub-models are selected based on genetic algorithm optimization toolbox. Partial least squares regression is used to combine these sub-models to eliminate collinearity among their prediction outputs. Results indicate that the proposed modeling approach has better prediction performance than previous ones.展开更多
As a dynamic projection to latent structures(PLS)method with a good output prediction ability,dynamic inner PLS(DiPLS)is widely used in the prediction of key performance indi-cators.However,due to the oblique decompos...As a dynamic projection to latent structures(PLS)method with a good output prediction ability,dynamic inner PLS(DiPLS)is widely used in the prediction of key performance indi-cators.However,due to the oblique decomposition of the input space by DiPLS,there are false alarms in the actual industrial process during fault detection.To address the above problems,a dynamic modeling method based on autoregressive-dynamic inner total PLS(AR-DiTPLS)is proposed.The method first uses the regression relation matrix to decompose the input space orthogonally,which reduces useless information for the predic-tion output in the quality-related dynamic subspace.Then,a vector autoregressive model(VAR)is constructed for the predic-tion score to separate dynamic information and static informa-tion.Based on the VAR model,appropriate statistical indicators are further constructed for online monitoring,which reduces the occurrence of false alarms.The effectiveness of the method is verified by a Tennessee-Eastman industrial simulation process and a three-phase flow system.展开更多
Image registration is an indispensable component in multi-source remote sensing image processing. In this paper, we put forward a remote sensing image registration method by including an improved multi-scale and multi...Image registration is an indispensable component in multi-source remote sensing image processing. In this paper, we put forward a remote sensing image registration method by including an improved multi-scale and multi-direction Harris algorithm and a novel compound feature. Multi-scale circle Gaussian combined invariant moments and multi-direction gray level co-occurrence matrix are extracted as features for image matching. The proposed algorithm is evaluated on numerous multi-source remote sensor images with noise and illumination changes. Extensive experimental studies prove that our proposed method is capable of receiving stable and even distribution of key points as well as obtaining robust and accurate correspondence matches. It is a promising scheme in multi-source remote sensing image registration.展开更多
Multi-source domain adaptation utilizes multiple source domains to learn the knowledge and transfers it to an unlabeled target domain.To address the problem,most of the existing methods aim to minimize the domain shif...Multi-source domain adaptation utilizes multiple source domains to learn the knowledge and transfers it to an unlabeled target domain.To address the problem,most of the existing methods aim to minimize the domain shift by auxiliary distribution alignment objectives,which reduces the effect of domain-specific features.However,without explicitly modeling the domain-specific features,it is not easy to guarantee that the domain-invariant representation extracted from input domains contains domain-specific information as few as possible.In this work,we present a different perspective on MSDA,which employs the idea of feature elimination to reduce the influence of domain-specific features.We design two different ways to extract domain-specific features and total features and construct the domain-invariant representations by eliminating the domain-specific features from total features.The experimental results on different domain adaptation datasets demonstrate the effectiveness of our method and the generalization ability of our model.展开更多
User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated...User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases.展开更多
With the purpose of improving the accuracy of text categorization and reducing the dimension of the feature space,this paper proposes a two-stage feature selection method based on a novel category correlation degree(C...With the purpose of improving the accuracy of text categorization and reducing the dimension of the feature space,this paper proposes a two-stage feature selection method based on a novel category correlation degree(CCD)method and latent semantic indexing(LSI).In the first stage,a novel CCD method is proposed to select the most effective features for text classification,which is more effective than the traditional feature selection method.In the second stage,document representation requires a high dimensionality of the feature space and does not take into account the semantic relation between features,which leads to a poor categorization accuracy.So LSI method is proposed to solve these problems by using statistically derived conceptual indices to replace the individual terms which can discover the important correlative relationship between features and reduce the feature space dimension.Firstly,each feature in our algorithm is ranked depending on their importance of classification using CCD method.Secondly,we construct a new semantic space based on LSI method among features.The experimental results have proved that our method can reduce effectively the dimension of text vector and improve the performance of text categorization.展开更多
Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is...Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is that the number of communities K must be predefined manually.We propose a flexible nonparametric Bayesian generative model for count-value networks,which can allow K to increase as more and more data are encountered instead of to be fixed in advance.The Indian buffet process was used to model the community assignment matrix Z,and an uncol-lapsed Gibbs sampler has been derived.However,as the community assignment matrix Zis a structured multi-variable parameter,how to summarize the posterior inference results andestimate the inference quality about Z,is still a considerable challenge in the literature.In this paper,a graph convolutional neural network based graph classifier was utilized to help tosummarize the results and to estimate the inference qualityabout Z.We conduct extensive experiments on synthetic data and real data,and find that empirically,the traditional posterior summarization strategy is reliable.展开更多
The Indian buffet process(IBP)and phylogenetic Indian buffet process(pIBP)can be used as prior models to infer latent features in a data set.The theoretical properties of these models are under-explored,however,especi...The Indian buffet process(IBP)and phylogenetic Indian buffet process(pIBP)can be used as prior models to infer latent features in a data set.The theoretical properties of these models are under-explored,however,especially in high dimensional settings.In this paper,we show that under mild sparsity condition,the posterior distribution of the latent feature matrix,generated via IBP or pIBP priors,converges to the true latent feature matrix asymptotically.We derive the posterior convergence rate,referred to as the contraction rate.We show that the convergence results remain valid even when the dimensionality of the latent feature matrix increases with the sample size,therefore making the posterior inference valid in high dimensional settings.We demonstrate the theoretical results using computer simulation,in which the parallel-tempering Markov chain Monte Carlo method is applied to overcome computational hurdles.The practical utility of the derived properties is demonstrated by inferring the latent features in a reverse phase protein arrays(RPPA)dataset under the IBP prior model.展开更多
基金supported by National Nature Science Foundation of China under Grant No.60905017,61072061National High Technical Research and Development Program of China(863 Program)under Grant No.2009AA01A346+1 种基金111 Project of China under Grant No.B08004the Special Project for Innovative Young Researchers of Beijing University of Posts and Telecommunications
文摘As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between "general word" and "special word" in LDA topics.Therefore,we add a constraint to the LDA objective function to let the "general words" only happen in "general topics" other than "special topics".Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.
基金The Science and Technology Development Fund from Macao Government (No007/2006/A)
文摘This paper proposed a novel feature selection method LUIFS ( latent utility of irrelevant feature selection) that not only selects the relevant features, but also targets at discovering the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The method minimizes the information lost and simultaneously maximizes the final classification accuracy. The classification error rates of the LUIFS method on 16 real-life datasets from UCI machine learning repository were evaluated using the ID3, Na^ve-Bayes, and IB (instance-based classifier) learning algorithms, respectively; and compared with those of the same algorithms with no feature selection (NoFS), feature subset selection (FSS), and correlation-based feature selection (CFS). The empirical results demonstrate that the LUIFS can improve the performance of learning algorithms by taking the latent relevance for irrelevant attributes into consideration, and hence including those potentially important attributes into the optimal feature subset for classification.
文摘This paper aims at providing multi-source remote sensing images registered in geometric space for image fusion.Focusing on the characteristics and differences of multi-source remote sensing images,a feature-based registration algorithm is implemented.The key technologies include image scale-space for implementing multi-scale properties,Harris corner detection for keypoints extraction,and partial intensity invariant feature descriptor(PIIFD)for keypoints description.Eventually,a multi-scale Harris-PIIFD image registration algorithm framework is proposed.The experimental results of fifteen sets of representative real data show that the algorithm has excellent,stable performance in multi-source remote sensing image registration,and can achieve accurate spatial alignment,which has strong practical application value and certain generalization ability.
基金Supported partially by the Post Doctoral Natural Science Foundation of China(2013M532118,2015T81082)the National Natural Science Foundation of China(61573364,61273177,61503066)+2 种基金the State Key Laboratory of Synthetical Automation for Process Industriesthe National High Technology Research and Development Program of China(2015AA043802)the Scientific Research Fund of Liaoning Provincial Education Department(L2013272)
文摘Strong mechanical vibration and acoustical signals of grinding process contain useful information related to load parameters in ball mills. It is a challenge to extract latent features and construct soft sensor model with high dimensional frequency spectra of these signals. This paper aims to develop a selective ensemble modeling approach based on nonlinear latent frequency spectral feature extraction for accurate measurement of material to ball volume ratio. Latent features are first extracted from different vibrations and acoustic spectral segments by kernel partial least squares. Algorithms of bootstrap and least squares support vector machines are employed to produce candidate sub-models using these latent features as inputs. Ensemble sub-models are selected based on genetic algorithm optimization toolbox. Partial least squares regression is used to combine these sub-models to eliminate collinearity among their prediction outputs. Results indicate that the proposed modeling approach has better prediction performance than previous ones.
基金supported by the National Natural Science Foundation of China(62273354,61673387,61833016).
文摘As a dynamic projection to latent structures(PLS)method with a good output prediction ability,dynamic inner PLS(DiPLS)is widely used in the prediction of key performance indi-cators.However,due to the oblique decomposition of the input space by DiPLS,there are false alarms in the actual industrial process during fault detection.To address the above problems,a dynamic modeling method based on autoregressive-dynamic inner total PLS(AR-DiTPLS)is proposed.The method first uses the regression relation matrix to decompose the input space orthogonally,which reduces useless information for the predic-tion output in the quality-related dynamic subspace.Then,a vector autoregressive model(VAR)is constructed for the predic-tion score to separate dynamic information and static informa-tion.Based on the VAR model,appropriate statistical indicators are further constructed for online monitoring,which reduces the occurrence of false alarms.The effectiveness of the method is verified by a Tennessee-Eastman industrial simulation process and a three-phase flow system.
基金supported by National Nature Science Foundation of China (Nos. 61462046 and 61762052)Natural Science Foundation of Jiangxi Province (Nos. 20161BAB202049 and 20161BAB204172)+2 种基金the Bidding Project of the Key Laboratory of Watershed Ecology and Geographical Environment Monitoring, NASG (Nos. WE2016003, WE2016013 and WE2016015)the Science and Technology Research Projects of Jiangxi Province Education Department (Nos. GJJ160741, GJJ170632 and GJJ170633)the Art Planning Project of Jiangxi Province (Nos. YG2016250 and YG2017381)
文摘Image registration is an indispensable component in multi-source remote sensing image processing. In this paper, we put forward a remote sensing image registration method by including an improved multi-scale and multi-direction Harris algorithm and a novel compound feature. Multi-scale circle Gaussian combined invariant moments and multi-direction gray level co-occurrence matrix are extracted as features for image matching. The proposed algorithm is evaluated on numerous multi-source remote sensor images with noise and illumination changes. Extensive experimental studies prove that our proposed method is capable of receiving stable and even distribution of key points as well as obtaining robust and accurate correspondence matches. It is a promising scheme in multi-source remote sensing image registration.
基金supported by the National Natural Science Foundation of China(NSFC)(Grant Nos.61876130 and 61932009).
文摘Multi-source domain adaptation utilizes multiple source domains to learn the knowledge and transfers it to an unlabeled target domain.To address the problem,most of the existing methods aim to minimize the domain shift by auxiliary distribution alignment objectives,which reduces the effect of domain-specific features.However,without explicitly modeling the domain-specific features,it is not easy to guarantee that the domain-invariant representation extracted from input domains contains domain-specific information as few as possible.In this work,we present a different perspective on MSDA,which employs the idea of feature elimination to reduce the influence of domain-specific features.We design two different ways to extract domain-specific features and total features and construct the domain-invariant representations by eliminating the domain-specific features from total features.The experimental results on different domain adaptation datasets demonstrate the effectiveness of our method and the generalization ability of our model.
基金supported by the Applied Research Center of Artificial Intelligence,Wuhan College(Grant Number X2020113)the Wuhan College Research Project(Grant Number KYZ202009).
文摘User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases.
基金the National Natural Science Foundation of China(Nos.61073193 and 61300230)the Key Science and Technology Foundation of Gansu Province(No.1102FKDA010)+1 种基金the Natural Science Foundation of Gansu Province(No.1107RJZA188)the Science and Technology Support Program of Gansu Province(No.1104GKCA037)
文摘With the purpose of improving the accuracy of text categorization and reducing the dimension of the feature space,this paper proposes a two-stage feature selection method based on a novel category correlation degree(CCD)method and latent semantic indexing(LSI).In the first stage,a novel CCD method is proposed to select the most effective features for text classification,which is more effective than the traditional feature selection method.In the second stage,document representation requires a high dimensionality of the feature space and does not take into account the semantic relation between features,which leads to a poor categorization accuracy.So LSI method is proposed to solve these problems by using statistically derived conceptual indices to replace the individual terms which can discover the important correlative relationship between features and reduce the feature space dimension.Firstly,each feature in our algorithm is ranked depending on their importance of classification using CCD method.Secondly,we construct a new semantic space based on LSI method among features.The experimental results have proved that our method can reduce effectively the dimension of text vector and improve the performance of text categorization.
基金supported by the National Basic Research Program of China(973)(2012CB316402)The National Natural Science Foundation of China(Grant Nos.61332005,61725205)+3 种基金The Research Project of the North Minzu University(2019XYZJK02,2019xYZJK05,2017KJ24,2017KJ25,2019MS002)Ningxia first-classdisciplinc and scientific research projects(electronic science and technology,NXYLXK2017A07)NingXia Provincial Key Discipline Project-Computer ApplicationThe Provincial Natural Science Foundation ofNingXia(NZ17111,2020AAC03219).
文摘Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is that the number of communities K must be predefined manually.We propose a flexible nonparametric Bayesian generative model for count-value networks,which can allow K to increase as more and more data are encountered instead of to be fixed in advance.The Indian buffet process was used to model the community assignment matrix Z,and an uncol-lapsed Gibbs sampler has been derived.However,as the community assignment matrix Zis a structured multi-variable parameter,how to summarize the posterior inference results andestimate the inference quality about Z,is still a considerable challenge in the literature.In this paper,a graph convolutional neural network based graph classifier was utilized to help tosummarize the results and to estimate the inference qualityabout Z.We conduct extensive experiments on synthetic data and real data,and find that empirically,the traditional posterior summarization strategy is reliable.
文摘The Indian buffet process(IBP)and phylogenetic Indian buffet process(pIBP)can be used as prior models to infer latent features in a data set.The theoretical properties of these models are under-explored,however,especially in high dimensional settings.In this paper,we show that under mild sparsity condition,the posterior distribution of the latent feature matrix,generated via IBP or pIBP priors,converges to the true latent feature matrix asymptotically.We derive the posterior convergence rate,referred to as the contraction rate.We show that the convergence results remain valid even when the dimensionality of the latent feature matrix increases with the sample size,therefore making the posterior inference valid in high dimensional settings.We demonstrate the theoretical results using computer simulation,in which the parallel-tempering Markov chain Monte Carlo method is applied to overcome computational hurdles.The practical utility of the derived properties is demonstrated by inferring the latent features in a reverse phase protein arrays(RPPA)dataset under the IBP prior model.