This paper focuses on semantic knowl- edge acquisition from blogs with the proposed tag- topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer be- tween the document and th...This paper focuses on semantic knowl- edge acquisition from blogs with the proposed tag- topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer be- tween the document and the topic. Each document is represented by a mixture of tags; each tag is as- sociated with a multinomial distribution over topics and each topic is associated with a multinomial dis- trNution over words. After parameter estimation, the tags are used to descrNe the underlying topics. Thus the latent semantic knowledge within the top- ics could be represented explicitly. The tags are treated as concepts, and the top-N words from the top topics are selected as related words of the con- cepts. Then PMI-IR is employed to compute the re- latedness between each tag-word pair and noisy words with low correlation removed to improve the quality of the semantic knowledge. Experiment re- sults show that the proposed method can effectively capture semantic knowledge, especially the polyse- me and synonym.展开更多
A negative correlation between δ 18O in mon-soon precipitation and f, the ratio of precipitable water in monsoon region to that in water source area, is hypothesized. Using the Rayleigh model, a new method for identi...A negative correlation between δ 18O in mon-soon precipitation and f, the ratio of precipitable water in monsoon region to that in water source area, is hypothesized. Using the Rayleigh model, a new method for identifying ori-gin of summer monsoon rainfall is developed based on the hypothesis. In order to validate the method, the isotopic data at New Delhi, a typical station in the southwest monsoon region, and Hong Kong, a typical station in the southeast monsoon region, were collected and analyzed for case studies. The case studies indicate that the water source areas of the monsoon rainfall at the two stations identified by the method are accordant with the general atmosphere circulation pat-terns. The method developed in this paper is significantly important for tracing the origin of summer monsoon pre-cipitation.展开更多
This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analy...This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named 'stochastic variational inference' and 'SGRLD', our algorithm achieves a faster convergence rate and better performance.展开更多
基金supported by the National Natural Science Foundation of China under Grants No.90920005,No.61003192the Key Project of Philosophy and Social Sciences Research,Ministry of Education under Grant No.08JZD0032+3 种基金the Program of Introducing Talents of Discipline to Universities under Grant No.B07042the Natural Science Foundation of Hubei Province under Grants No.2011CDA034,No.2009CDB145Chenguang Program of Wuhan Municipality under Grant No.201050231067the selfdetermined research funds of CCNU from the colleges' basic research and operation of MOE under Grants No.CCNU10A02009,No.CCNU10C01005
文摘This paper focuses on semantic knowl- edge acquisition from blogs with the proposed tag- topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer be- tween the document and the topic. Each document is represented by a mixture of tags; each tag is as- sociated with a multinomial distribution over topics and each topic is associated with a multinomial dis- trNution over words. After parameter estimation, the tags are used to descrNe the underlying topics. Thus the latent semantic knowledge within the top- ics could be represented explicitly. The tags are treated as concepts, and the top-N words from the top topics are selected as related words of the con- cepts. Then PMI-IR is employed to compute the re- latedness between each tag-word pair and noisy words with low correlation removed to improve the quality of the semantic knowledge. Experiment re- sults show that the proposed method can effectively capture semantic knowledge, especially the polyse- me and synonym.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.90511007,40501014 and 90302006)funding from CAREE,Chinese Academy of Sciences(Grant No.2004102)+1 种基金the Hundred Talents Program of Chinese Academy of Sciences(Grant No.2004401)the project for Outstanding Young Scientists of the National Natural Science Foundation of China(Grant No.40121101).
文摘A negative correlation between δ 18O in mon-soon precipitation and f, the ratio of precipitable water in monsoon region to that in water source area, is hypothesized. Using the Rayleigh model, a new method for identifying ori-gin of summer monsoon rainfall is developed based on the hypothesis. In order to validate the method, the isotopic data at New Delhi, a typical station in the southwest monsoon region, and Hong Kong, a typical station in the southeast monsoon region, were collected and analyzed for case studies. The case studies indicate that the water source areas of the monsoon rainfall at the two stations identified by the method are accordant with the general atmosphere circulation pat-terns. The method developed in this paper is significantly important for tracing the origin of summer monsoon pre-cipitation.
基金Project supported by the National Natural Science Foundation of China (Nos. 61170092, 61133011, and 61103091)
文摘This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named 'stochastic variational inference' and 'SGRLD', our algorithm achieves a faster convergence rate and better performance.