摘要
Labeled LDA能挖掘出给定主题下的单词概率分布,但却无法分析主题词之间的关联关系。采用PMI虽可计算两个单词的相互关系,但却和给定主题失去联系。受PMI在窗口中统计词对共现频率的启发,提出了一种PL-LDA(Pointwise Labeled LDA)主题模型,可计算给定主题下词对的联合概率分布,在航空安全报告数据集上的实验表明PL-LDA模型所得结果具有很好的解释性。利用PL-LDA构建了主题文本网络,该网络除能反映主题词分布外,还可展现它们之间的复杂关联关系。
Labeled LDA can mine words' probabilities under a given topic,however,it can't analyze the association relationships among these topic words.Although the correlation between word pairs can be calculated by utilizing PMI(Pointwise Mutual Information),their relationship to the given topic is lost.Motivated by the operation of counting word pairs in a fixed window used in PMI,this paper proposes a topic model called PL-LDA(Pointwise Labeled LDA),which can compute the joint probabilities between word pairs under a given topic.Experimental results on aviation safety reports show that this model achieves results with good interpretability.Based on the results of PL-LDA,this paper constructs a topic text network,which provides rich and effective information for analyzers including reflecting the distribution of topic words and displaying the complex relationships among them.
出处
《复杂系统与复杂性科学》
CSCD
北大核心
2017年第1期52-57,110,共7页
Complex Systems and Complexity Science
基金
国家自然科学基金(61201414
61301245
U1233113)