Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic m...Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. How- ever, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.展开更多
Stochastic variational inference (SVI) can learn topic models with very big corpora. It optimizes the variational objective by using the stochastic natural gradient algorithm with a decreasing learning rate. This ra...Stochastic variational inference (SVI) can learn topic models with very big corpora. It optimizes the variational objective by using the stochastic natural gradient algorithm with a decreasing learning rate. This rate is crucial for SVI; however, it is often tuned by hand in real applications. To address this, we develop a novel algorithm, which tunes the learning rate of each iteration adaptively. The proposed algorithm uses the Kullback-Leibler (KL) divergence to measure the similarity between the variational distribution with noisy update and that with batch update, and then optimizes the learning rates by minimizing the KL divergence. We apply our algorithm to two representative topic models: latent Dirichlet allocation and hierarchical Dirichlet process. Experimental results indicate that our algorithm performs better and converges faster than commonly used learning rates.展开更多
Stochastic variational inference is an efficient Bayesian inference technology for massive datasets,which approximates posteriors by using noisy gradient estimates.Traditional stochastic variational inference can only...Stochastic variational inference is an efficient Bayesian inference technology for massive datasets,which approximates posteriors by using noisy gradient estimates.Traditional stochastic variational inference can only be performed in a centralized manner,which limits its applications in a wide range of situations where data is possessed by multiple nodes.Therefore,this paper develops a novel trust-region based stochastic variational inference algorithm for a general class of conjugate-exponential models over distributed and asynchronous networks,where the global parameters are diffused over the network by using the Metropolis rule and the local parameters are updated by using the trust-region method.Besides,a simple rule is introduced to balance the transmission frequencies between neighboring nodes such that the proposed distributed algorithm can be performed in an asynchronous manner.The utility of the proposed algorithm is tested by fitting the Bernoulli model and the Gaussian model to different datasets on a synthetic network,and experimental results demonstrate its effectiveness and advantages over existing works.展开更多
We present a new category of physics-informed neural networks called physics informed variational embedding generative adversarial network(PI-VEGAN),that effectively tackles the forward,inverse,and mixed problems of s...We present a new category of physics-informed neural networks called physics informed variational embedding generative adversarial network(PI-VEGAN),that effectively tackles the forward,inverse,and mixed problems of stochastic differential equations.In these scenarios,the governing equations are known,but only a limited number of sensor measurements of the system parameters are available.We integrate the governing physical laws into PI-VEGAN with automatic differentiation,while introducing a variational encoder for approximating the latent variables of the actual distribution of the measurements.These latent variables are integrated into the generator to facilitate accurate learning of the characteristics of the stochastic partial equations.Our model consists of three components,namely the encoder,generator,and discriminator,each of which is updated alternatively employing the stochastic gradient descent algorithm.We evaluate the effectiveness of PI-VEGAN in addressing forward,inverse,and mixed problems that require the concurrent calculation of system parameters and solutions.Numerical results demonstrate that the proposed method achieves satisfactory stability and accuracy in comparison with the previous physics-informed generative adversarial network(PI-WGAN).展开更多
基金This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61572226 and 61876069, and the Key Scientific and Technological Research and Development Project of Jilin Province of China under Grant Nos. 20180201067GX and 20180201044GX.
文摘Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. How- ever, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 61170092, 61133011 and 61103091.
文摘Stochastic variational inference (SVI) can learn topic models with very big corpora. It optimizes the variational objective by using the stochastic natural gradient algorithm with a decreasing learning rate. This rate is crucial for SVI; however, it is often tuned by hand in real applications. To address this, we develop a novel algorithm, which tunes the learning rate of each iteration adaptively. The proposed algorithm uses the Kullback-Leibler (KL) divergence to measure the similarity between the variational distribution with noisy update and that with batch update, and then optimizes the learning rates by minimizing the KL divergence. We apply our algorithm to two representative topic models: latent Dirichlet allocation and hierarchical Dirichlet process. Experimental results indicate that our algorithm performs better and converges faster than commonly used learning rates.
基金the National Natural Science Foundation of China under Grant Nos.61922076,61873252,61725304,and 61973324in part by Guangdong Basic and Applied Basic Research Foundation under Grant No.2021B1515020094in part by the Guangdong Provincial Key Laboratory of Computational Science under Grant No.2020B1212060032。
文摘Stochastic variational inference is an efficient Bayesian inference technology for massive datasets,which approximates posteriors by using noisy gradient estimates.Traditional stochastic variational inference can only be performed in a centralized manner,which limits its applications in a wide range of situations where data is possessed by multiple nodes.Therefore,this paper develops a novel trust-region based stochastic variational inference algorithm for a general class of conjugate-exponential models over distributed and asynchronous networks,where the global parameters are diffused over the network by using the Metropolis rule and the local parameters are updated by using the trust-region method.Besides,a simple rule is introduced to balance the transmission frequencies between neighboring nodes such that the proposed distributed algorithm can be performed in an asynchronous manner.The utility of the proposed algorithm is tested by fitting the Bernoulli model and the Gaussian model to different datasets on a synthetic network,and experimental results demonstrate its effectiveness and advantages over existing works.
基金supported by the National Natural Science Foundation of China(Grant Nos.11771257,12271468)the Natural Science Foundation of Shandong Province(Grant Nos.ZR2021MA010,ZR2021ZD03).
文摘We present a new category of physics-informed neural networks called physics informed variational embedding generative adversarial network(PI-VEGAN),that effectively tackles the forward,inverse,and mixed problems of stochastic differential equations.In these scenarios,the governing equations are known,but only a limited number of sensor measurements of the system parameters are available.We integrate the governing physical laws into PI-VEGAN with automatic differentiation,while introducing a variational encoder for approximating the latent variables of the actual distribution of the measurements.These latent variables are integrated into the generator to facilitate accurate learning of the characteristics of the stochastic partial equations.Our model consists of three components,namely the encoder,generator,and discriminator,each of which is updated alternatively employing the stochastic gradient descent algorithm.We evaluate the effectiveness of PI-VEGAN in addressing forward,inverse,and mixed problems that require the concurrent calculation of system parameters and solutions.Numerical results demonstrate that the proposed method achieves satisfactory stability and accuracy in comparison with the previous physics-informed generative adversarial network(PI-WGAN).