期刊文献+

一种基于PL-LDA模型的主题文本网络构建方法 被引量:2

A Topic Text Network Construction Method Based on PL-LDA Model
下载PDF
导出
摘要 Labeled LDA能挖掘出给定主题下的单词概率分布,但却无法分析主题词之间的关联关系。采用PMI虽可计算两个单词的相互关系,但却和给定主题失去联系。受PMI在窗口中统计词对共现频率的启发,提出了一种PL-LDA(Pointwise Labeled LDA)主题模型,可计算给定主题下词对的联合概率分布,在航空安全报告数据集上的实验表明PL-LDA模型所得结果具有很好的解释性。利用PL-LDA构建了主题文本网络,该网络除能反映主题词分布外,还可展现它们之间的复杂关联关系。 Labeled LDA can mine words' probabilities under a given topic,however,it can't analyze the association relationships among these topic words.Although the correlation between word pairs can be calculated by utilizing PMI(Pointwise Mutual Information),their relationship to the given topic is lost.Motivated by the operation of counting word pairs in a fixed window used in PMI,this paper proposes a topic model called PL-LDA(Pointwise Labeled LDA),which can compute the joint probabilities between word pairs under a given topic.Experimental results on aviation safety reports show that this model achieves results with good interpretability.Based on the results of PL-LDA,this paper constructs a topic text network,which provides rich and effective information for analyzers including reflecting the distribution of topic words and displaying the complex relationships among them.
出处 《复杂系统与复杂性科学》 CSCD 北大核心 2017年第1期52-57,110,共7页 Complex Systems and Complexity Science
基金 国家自然科学基金(61201414 61301245 U1233113)
关键词 主题模型 文本挖掘 复杂网络 PMI topic mode text mining complex network PMI
  • 相关文献

参考文献1

二级参考文献6

  • 1Krenn, B. & C. Sarmuelsson. 1997. The Linguist's Guide to Statistics.Manuscript. University of Saarbruicken.
  • 2Allen, J. 1987/1995. NaturalLanguage Understanding [M]. 1st/2nd edition. Menlo Park, CA: Benjamin Cummings.
  • 3Charniak, E. 1996. Statistical Language Learning [M]. Cambridge, MA:MIT Press.
  • 4Gazdar, G. & C. Mellish. 1989a. Natural Language Processing in LISP [M].Reading, MA:Addison Wesley.
  • 5Gazdar, G. & C. Mellish. 1989b. Natural Language Processing in PROLOG [M].Reading, MA: Addison Wesley.
  • 6Jurafsky, D. & J. Martin. 2000. Speech and Language Processing: An Introductionto Speech Recognition,Computational Linguistics and Natural Language Processing [M]. UpperSaddle River, NJ: Prentice Hall.

共引文献6

同被引文献26

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部