期刊文献+

一种基于主题的概率文档相关模型 被引量:3

Topic-based Probabilistic Document Correlation Model
下载PDF
导出
摘要 现有文档关系分析模型难以从主题层次上判别文档相关性。为此,提出了一个基于主题的概率文档相关模型(TPDC)。TPDC借助Latent Dirichlet Allocation模型学习文档的主题结构;在计算出主题后验概率和主题相似度的基础上推导出文档后验概率;基于文档后验概率构建文档相关性分析模型。实验结果证明,TPDC模型在文档检索精度和文档压缩程度两方面优于向量空间模型,因而更能胜任实际应用中的文档检索任务。 Existing models on document relationship analysis have a difficulty in learning document correlation from topic level. To overcome this difficulty, a topic-based probabilistic document correlation model (TPDC) was proposed. The model learns the topic structure of a document through the latent dirichlet allocation model, infers the posterior probability of a document by computing the posterior probability of its topics and topic similarity, and then constructs the document correlation model based on the document posterior probability. Experimental results show that the TPIX2 model outperforms the vector space model in retrieval precision and document compression. So the TPDC model is more competent for document retrieval tasks in application.
出处 《计算机科学》 CSCD 北大核心 2008年第10期178-180,218,共4页 Computer Science
基金 广东省自然科学基金项目(07006474) 广东省科技攻关项目(2007B010200044)
关键词 主题 主题相似性 文档相关性 文本挖掘 Topic, Topic similarity,Document correlation,Text mining
  • 相关文献

参考文献9

  • 1Salton G, McGill M J. Introduction to modem information retrieval. New York: McGraw-Hill, 1983
  • 2Baeza Yates R, Ribeiro-Neto B. Modern information retrieval. New York: ACM Press and Addison Wesley, 1999
  • 3van Rijsbergen C J . Information retrieval. London : Butterw - orths, 1979
  • 4Becker J, Kuropka D. Topic-based vector space model//Proceedings of Sixth International Conference on Business Information System. Colorado Springs, 2003 : 7-12
  • 5Wan Xiao-jun, Peng Yu xin. A new retrieval model based on TextTiling for document similarity search. Journal of Computer Science and Technology, 2005,20(4) : 552-558
  • 6Hearst M A. Multi paragraph segmentation of expository text// Proceedings of 32nd Meeting of the Association for Computa tional Linguistics. Los Cruces, 1994 : 9-16
  • 7Lovasz L, Plummer M D. Matching Theory. Amsterdam: Elsevier Science Publishers B V, 1986
  • 8Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003,3 : 993-1022
  • 9Griffiths T L, Steyvers M. Finding Scientific Topics//Proceedings of the National Academy of Sciences. 2004:5228-5235

同被引文献31

  • 1熊朝松,甘岚.基于子主题概念的Web主题挖掘[J].计算机与现代化,2006(4):63-65. 被引量:1
  • 2陈君,唐雁.基于Web社会网络的个性化Web信息推荐模型[J].计算机科学,2006,33(4):185-187. 被引量:11
  • 3Martins A,Figueiredo M,Aguiar P.Kernels and similarity measures for text classification[C]//Proceedings of Conf Tele’2007,New York,USA,2007:1-4.
  • 4Yize Li,Jiazhong Nie,Yi Zhang,et al.Contextual recommendation based on text mining[C]//Proceedings of the 23rd International Conference on Computational Linguistics,Beijing,August 2010:692-700.
  • 5Waltinger U,Mehler A.Social Semantics and Its Evaluation by Means of Semantic Relatedness and Open Topic Models[C]//Proceedings of International Joint Conferences on Web Intelligence and Intelligent Agent Technologies,Milan,Italy,15-18 Sept.2009:42-49.
  • 6Wikipedia[EB/OL].http://www.wikipedia.org.
  • 7Olena Medelyan,David Milne,Catherine Legg,et al.Mining meaning from Wikipedia[J].International Journal of Human-Computer Studies,2009,67(9):716-754.
  • 8Strube M,Ponzetto S.WikiRelate Computing Semantic Relatedness Using Wikipedia[C]//Proceedings of the 21st National Conference on Artificial Intelligence,Boston,2006:1419-1424.
  • 9Gabrilovich G,Markovitch S.Computing Semantic Relatedness using Wikipedia-based Explict Semantic Analysis[C]//Proceedings of the20th International Joint Conference on Artificial Intelligence,2007:1606-1611.
  • 10Samer Hassan,Rada Mihalcea.Semantic Relateness Using Salient Semantic Analysis[C]//Proceedings of the 25th AAAI Conference on Artificial Intelligence,2011:884-889.

引证文献3

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部