期刊文献+

一种融合Wikipedia类图和主题特征的短文本检索方法 被引量:1

A short text retrieval method combining Wikipedia category graph and topic features
下载PDF
导出
摘要 社交网络的快速发展催生出大量短文本数据.鉴于短文本具有长度短、信息量少、特征稀疏、语法不规则等特点,根据Wikipedia类图(Wikipedia Category Graph,WCG)中包含的结构信息,通过分析其中的主题特征,提出一种语义特征选择及关联度计算方法.以此为基础,通过计算用户查询与目标短文本之间的语义关联度,实现对短文本的检索和排序.最后通过在Twitter子集上的实验结果表明,融合Wikipedia类图和主题特征的短文本检索方法比现有一些检索方法在评估指标MAP,P@k及R-Prec上具有更好的效果. The rapid development of social networks has resulted in a large number of short text data.Considering the short length,little information,sparse features and irregular grammar,a semantic feature selection and relatedness computation method are proposed in this paper,which is based on the analysis of the topic features of the structural information contained in the Wikipedia category graph(WCG).On this basis,according to computing the semantic relatedness between user queries and the target short text,a short text retrieval and sorting method is realized.Finally,the experimental results on twitter subsets show that the short text retrieval method combining Wikipedia category graph and topic features outperforms other current retrieval methods on MAP,P@k and R-Prec.
作者 李璞 肖宝 孙玉胜 张志锋 邓璐娟 Li Pu;Xiao Bao;Sun Yusheng;Zhang Zhifeng;Deng Lujuan(Software Engineering College,Zhengzhou University of Light Industry,Zhengzhou 450000,China;School of Electronics and Information Engineering,Beibu Gulf University,Qinzhou 535000,China)
出处 《河南师范大学学报(自然科学版)》 CAS 北大核心 2019年第6期22-30,共9页 Journal of Henan Normal University(Natural Science Edition)
基金 国家自然科学基金青年科学基金(61802352) 国家自然科学基金(61772210 61872439) 郑州轻工业大学博士科研基金资助(0215/13501050015) 郑州轻工业大学校级青年骨干教师培养对象资助计划(2018XGGJS006) 钦州市科学研究与技术开发计划项目(20189903) 广西高校中青年教师基础能力提升项目(KY2019KY0463)
关键词 Wikipedia类图 主题特征 短文本 信息检索 Wikipedia category graph topic features short text information retrieval
  • 相关文献

参考文献6

二级参考文献68

  • 1Adamic L A,Zhang J,Bakshy E,Ackerman M S. Knowledge sharing and yahoo answers:everyone knows something[A].2008.665-674.
  • 2Hotho A,Staab S,Stumme G. Wordnet improves text document clustering[A].2003.541-544.
  • 3Reforgiato Recupero D. A new unsupervised method for document clustering by using WordNet lexical and conceptual relations[J].Informarion Retrieval,2007,(06):563-579.doi:10.1007/s10791-007-9035-7.
  • 4Hu J,Fang L,Cao Y,Zeng H J,Li H,Yang Q,Chen Z. Enhancing text clustering by leveraging Wikipedia semantics[A].2008.179-186.
  • 5Hu X,Zhang X,Lu C,Park E K,Zhou X. Exploiting Wikipedia as external knowledge for document clustering[A].2009.389-396.
  • 6Blei D M,Ng A Y,Jordan M I. Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003.993-1022.
  • 7Hofraann T. Probabilistic latent semantic indexing[A].1999.50-57.
  • 8Xu W,Liu X,Gong Y. Document clustering based on non-negative matrix factorization[A].2003.267-273.
  • 9Lin C J. Projected gradient methods for non-negative matrix factorization[J].Neural Computation,2007,(10):2756-2779.doi:10.1162/neco.2007.19.10.2756.
  • 10Cutting D R,Pedersen J O,Karger D R,Tukey J W. Scatter/gather:a cluster-based approach to browsing large document collections[A].1992.318-329.

共引文献76

同被引文献10

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部