期刊文献+

基于HLDA-IDF模型的网络文本主题挖掘研究 被引量:2

Research on Topic Mining of Web Text Based on HLDA-IDF Model
下载PDF
导出
摘要 [目的/意义]为了弥补LDA模型建模过程中未考虑到网络文本中文档关注度和质量度这一因素,并增强结果的语义可解释性和主题表示能力,文章提出了一种热度加权的HLDA-IDF的网络文本主题挖掘模型。[方法/过程]本文首先是给出了较为准确的热度定义,并对LDA模型进行热度加权,构建出了HLDA模型,再依据词汇的主题表示能力存在差异这一实际情况,引入TF-IDF算法并改进,构建出HLDA-IDF模型,最后利用实际论坛数据进行实验验证。[结果/结论]实验结果表明该模型的结果语义可解释性和主题表示能力较强。 [ Purpose/significance ] In order to make up the defect that the LDA model does not consider the factor of document attention and quality in web text, and to enhance the semantic interpretation and topic representing ability, this paper proposes a model for web text topic mining based on heat weighted HLDA-IDF. [ Method/process ] This paper first defines heat specifically, and combines with LDA model to construct the HLDA model. Then, as different terms have different topics representing abilities, the paper introduces and improves the TF-IDF algorithm, then builds the HLDA-IDF model. Finally, experiments verify the model by the actual forum data. [ Result/conclusion ] Results show that the proposed model has the better semantic interpretation and topic representing ability.
作者 陈斌 马静
出处 《情报理论与实践》 CSSCI 北大核心 2017年第10期117-122,144,共7页 Information Studies:Theory & Application
基金 国家自然科学基金项目"基于演化本体的网络舆情自适应跟踪方法研究"(项目编号:71373123) 江苏高校哲学社会科学研究重点项目"基于超网络的江苏教育微博舆情多元意见演化模型及应用研究"(项目编号:2015ZDIXM007) 江苏省普通高校研究生科研创新计划项目"社交网络上的舆情传播模型及控制策略研究"(项目编号:KYZZ15_0104)的成果
关键词 热度 模型 主题挖掘 网络文本 文本挖掘 heat model topic mining web text text mining
  • 相关文献

参考文献6

二级参考文献101

  • 1Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 2Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 3Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 4Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 5Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 6Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 7Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 8Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.
  • 9Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 10Wei X, Croft W B. LDA-based document models for ad hoc retrieval [C] //Proc of the 29th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York:ACM, 2006:178-185.

共引文献440

同被引文献23

引证文献2

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部