期刊文献+

基于层次狄利克雷过程的交互式主题建模 被引量:9

Interactive Topic Modeling Based on Hierarchical Dirichlet Process
下载PDF
导出
摘要 随着信息技术的快速发展,大量的文本数据产生、被收集和存储.主题模型是文本分析的重要工具之一,被广泛地应用于分析大规模文本集.然而,主题模型通常无法直观而有效地结合用户的领域专业知识对模型结果进行修正.针对这一问题,提出了一个交互式可视分析系统,帮助用户对主题模型进行交互修正.首先对层次狄利克雷过程进行了改进,使其支持单词约束;然后,使用矩阵视图对主题模型进行展示,并使用语义相关的词云布局帮助用户寻找单词约束,用户通过添加单词约束迭代优化主题模型;最后,通过案例分析及用户研究来评价该系统的可用性. With the rapid development of information technology, large amounts of text data have been produced, collected and stored. Topic modeling is one of the important tools in text analysis, and is widely used for large text collection analysis. However, the topic model usually cannot be combined with users' domain knowledge intuitively and effectively during the topic modeling process. In order to solve this problem, this paper proposes an interactive visual analysis system to help users refine generated topic models. First, the hierarchical Dirichlet process is modified to support the word constraints. Then, the generated topic models is displayed via a matrix view to visually reveal the underlying relationship between words and topics, and semantic-preserving word clouds is used to help users find word constraints effectively. User can interactively refine the topic models by adding word constraints. Finally, the applicability of this new system is demonstrated via case studies and user studies.
出处 《软件学报》 EI CSCD 北大核心 2016年第5期1114-1126,共13页 Journal of Software
基金 国家自然科学基金(61472354) 国家高技术研究发展计划(863)(2012AA12A404)~~
关键词 文本可视化 主题模型 文本分析 层次狄利克雷过程 text visualization topic model text analysis hierarchical Dirichlet process
  • 相关文献

参考文献25

  • 1Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003,3:993-1022.
  • 2Hu Y, Boyd-Graber J, Satinoff B. Interactive topic modeling. In: Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT 2011 ), Vol. 1.2011. 248-257.
  • 3Choo J, Lee C, Reddy CK, Park H. UTOPIAN: User-Driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans. on Visualization and Computer Graphics, 2013,19(12): 1992-2001. [doi: 10.1109/TVCG.2013.212].
  • 4Paulovich FV, Toledo FMB, Telles GP, Minghim R, Nonato LG. Semantic wordification of document collections. Computer Graphics Forum, 2012,31(3pt3):1145-1153. [doi: 10.1111/j.1467-8659.2012.03107.x].
  • 5Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990,41 (6):391-407. [doi: 10.1002/(SICI) 1097-4571 ( 199009)41:6<391 ::AID-ASI 1 >3.0.CO;2-9].
  • 6Hofmann T. Probabilistic latent semantic indexing. In: Proc. of the 22nd Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM Press, 1999.50-57. [doi: 10.1145/312624.312649].
  • 7Griffiths D, Tenenbaum M. Hierarchical topic models and the nested Chinese restaurant process. In: Advances in Neural Information Processing Systems 16: Proc. of the 2003 Conf. 2004.
  • 8Blei D, Lafferty J. Correlated topic models. Advances in Neural Information Processing Systems, 2006,18:147.
  • 9Teh YW, Jordan MI, Beal M J, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006, 101 (476): 1566-1581. [doi: 10.1198/016214506000000302].
  • 10Havre S, Hetzler B, Nowell L. Themeriver: Visualizing theme changes over time. In: Proc. of the IEEE Symp. on Information Visualization (InfoVis 2000). IEEE, 2000. 115-123. [doi: 10.1109/INFVIS.2000.885098].

同被引文献92

引证文献9

二级引证文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部