Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing 被引量：1

Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing

导出

摘要 Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation （sLDA） is acknowledged as a popular and competitive supervised topic model. How- ever, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA （Parallel and Online sLDA）, is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing. Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation （sLDA） is acknowledged as a popular and competitive supervised topic model. How- ever, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA （Parallel and Online sLDA）, is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.

作者 Yang Li Wen-Zhuo Song Bo Yang

机构地区 College of Computer Science and Technology Key Laboratory of Symbolic Computation and Knowledge Engineering Aviation University of Air Force

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第5期1007-1022,共16页 计算机科学技术学报（英文版）

基金 This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61572226 and 61876069, and the Key Scientific and Technological Research and Development Project of Jilin Province of China under Grant Nos. 20180201067GX and 20180201044GX.

关键词 topic modeling large-scale text classification stochastic variational inference cloud computing online learning topic modeling, large-scale text classification, stochastic variational inference, cloud computing online learning

分类号 TP391 [自动化与计算机技术—计算机应用技术] TU311.3 [建筑科学—结构工程]

引文网络
相关文献

同被引文献4

1Shi Zhiming,Huang Chengti.Network video quality assessment based on fuzzy inference system[J].The Journal of China Universities of Posts and Telecommunications,2018,25(1):70-77. 被引量：1
2Yuan Ye,Yu Minmin,Liu Jiming.Research on calculation method of text similarity based on smooth inverse frequency[J].The Journal of China Universities of Posts and Telecommunications,2020,27(2):56-64. 被引量：2
3张明.基于贝叶斯网络的集中化IT运维信息检索算法[J].吉林大学学报（信息科学版）,2021,39(5):576-582. 被引量：4
4袁满,张维罡,李明轩.基于认知图谱的智能问答系统推理模型研究[J].吉林大学学报（信息科学版）,2021,39(5):589-595. 被引量：6

引证文献1

1欧阳继红,曹竞月,王腾.Copula层次化变分推理[J].吉林大学学报（信息科学版）,2024,42(1):51-58.

1Rui Liu,Xingguang Wang,Deqing Wang,Yuan Zuo,He Zhang,Xianzhu Zheng.TOPIC SPLITTING： A HIERARCHICAL TOPIC MODEL BASED ON NON-NEGATIVE MATRIX FACTORIZATION[J].Journal of Systems Science and Systems Engineering,2018,27(4):479-496. 被引量：2
2Tie-Ke He,Hao Lian,Ze-Min Qin,Zhen-Yu Chen,Bin Luo.PTM： A Topic Model for the Inferring of the Penalty[J].Journal of Computer Science & Technology,2018,33(4):756-767. 被引量：1
3梁吉业,乔洁,曹付元,刘晓琳.面向短文本分析的分布式表示模型[J].计算机研究与发展,2018,55(8):1631-1640. 被引量：7
4钱海荣.激发兴趣训练能力提高素养——以《骆驼祥子》《海底两万里》阅读为例[J].语文世界（教师之窗）,2018,0(9):60-61.
5手语是世界通用的吗[J].奇闻怪事,2018,0(10):50-50.
6Piotr Koszelnik,Renata Gruca-Rokosz,Lilianna Bartoszek.An isotopic model for the origin of autochthonous organic matter contained in the bottom sediments of a reservoir[J].International Journal of Sediment Research,2018,33(3):285-293. 被引量：1
7侯西星,王晓东.系统性红斑狼疮患者骨折风险与骨密度水平研究[J].潍坊医学院学报,2017,39(6):415-418. 被引量：1
8苏宁,陈临强.视频监控下的行人性别检测[J].现代计算机（中旬刊）,2018(10):29-33.
9Yonglong Zhang,Haiyan Qin,Bin Li,Jin Wang,Sungyoung Lee,Zhiqiu Huang.Truthful Mechanism for Crowdsourcing Task Assignment[J].Tsinghua Science and Technology,2018,23(6):645-659. 被引量：1
10刘畅.对应用型本科院校管理类专业实践教学体系的思考[J].农家参谋,2018(8X):193-193.

Journal of Computer Science & Technology

2018年第5期

浏览历史

内容加载中请稍等...

Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing 被引量：1

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史