期刊文献+

基于改进LDA模型的离群评论选择 被引量:1

Outlier Review Selection Based on Improved LDA Model
下载PDF
导出
摘要 评论文本中的词符合幂律分布,使LDA模型词的分布偏向高频词,导致主题相似度大,表达能力下降。提出幂函数加权LDA(Latent Dirichlet Allocation)模型以提高低频词的表达能力。使用iForest算法,选择出与众不同且具有价值的评论集合。实验结果表明,选择的评论子集特征覆盖率较高,且有较高的平均信息量。 The words in review text conform to the power law distribution, which makes the distribution of LDA model tends to high-frequency words. Topics similarity is large and expression ability drops. Therefore, a power law function weighted LDA (Latent Dirichlet Allocation) model is proposed to improve the expressive power of low-frequency words. Finally, iForest algo- rithm is used to select a different and valuable set of comments. Experimental results show that the feature coverage of selected comment subsets is higher and it has higher average information.
作者 董振涛
出处 《软件导刊》 2018年第1期38-40,共3页 Software Guide
关键词 LDA iForest 特征覆盖率 平均信息量 LDA iForest feature coverage average information
  • 相关文献

参考文献3

二级参考文献49

  • 1Blei D,Ng A,Jordan M.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003:3,993-1022.
  • 2Griffiths T L,Steyvers M.A Probabilistic Approach to Semantic Representation[C]∥ Proceedings of the 24th Annual Conference of the Cognitive Science Society,2002:381-386.
  • 3Griffiths T L,Steyvers M.Prediction and Semantic Association[C]∥ Advances in Neural Information Processing Systems,2003,15:11-18.
  • 4Griffiths T L,Steyvers M.Finding Scientific Topics[C]∥ Proceedings of the National Academy of Science,2004:5228-5235.
  • 5Hofmann T.Probabilistic Latent Semantic Analysis[C]∥ Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence.1999:289-296.
  • 6Deerwester S,Dumais S,Furnas G,et al.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Science,1990,41:391-407.
  • 7Hofmann T.Unsupervised Learning by Probabilistic Latent Semantic Analysis[J].Machine Learning Journal,2001,42(1):177-196.
  • 8Blei D,Lafferty J.Correlated Topic Models[C]∥ Advances in Neural Information Processing Systems,2006,18:147-154.
  • 9Blei D,Griffiths T,Jordan M,et al.Hierarchical Topic Models and the Nested Chinese Restaurant[C]∥ Advances in Neural Information Processing Systems,2004,16:17-24.
  • 10Li W,McCallum A.Pachinko Allocation:DAG-Structured Mixture Models of Topic Correlations[C]∥ 23th International Conference on Machine Learning,2006:577-584.

共引文献118

同被引文献14

引证文献1

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部