期刊文献+

基于词重要性的信息检索图模型 被引量:11

An Information Retrieval Graph Model Based on Term Importance
下载PDF
导出
摘要 在信息检索建模中,确定索引词项在文档中的重要性是一项重要内容。以词袋(bag-of-word)的形式表示文档来建立检索模型的方法中大多是基于词项独立性假设,用TF和IDF的函数来计算词项的重要性,并未考虑词项之间的关系。该文采用基于词项图(graph-of-word)的文档表示形式来捕获词项间的依赖关系,提出了一种新的基于词重要性的信息检索图模型TI-IDF。根据词项图得到文档中词项的共现矩阵和词项间的概率转移矩阵,通过马尔科夫链计算方法来确定词项在文档中的重要性(Term Importance,TI),并以此替代索引过程中传统的词项频率TF。该模型具有更好的鲁棒性,我们在国际公开数据集上与传统的检索模型进行了比较。实验结果表明,该文提出的模型都要优于BM25,且在大多数情况下优于BM25的扩展模型、TW-IDF等模型。 In information retrieval modeling,to determine the importance of index terms of the documents is an important content.Those retrieval models which use a bag-of-word document representation are mostly based on the term independence assumption,and calculate the termsimportance by the functions of TF and IDF,without considering about the relationship between terms.In this paper,we used a document representation based on graph-ofword to capture the dependencies between terms,and proposed a novel information graph retrieval model TI-IDF.According to the graph,we obtained the co-occurrence matrix and the probability transfer matrix of terms,then we determined the termsimportance(TI)by using the Markov chain computing method,and used TI to replace traditional term frequency at indexing time.This model possesses a better robustness,we compared our model with traditional retrieval models on the international public datasets.Experimental results show that,the proposed model is consistently superior to BM25 and better than its extension models,TW-IDF and other models in most cases.
作者 王明文 洪欢 江爱文 左家莉 WANG Mingwen HONG Huan JIANG Aiwen ZUO Jiali(School of Computer Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, Chin)
出处 《中文信息学报》 CSCD 北大核心 2016年第4期134-141,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金(61272212 61462043 61462045) 江西省自然科学基金(20122BAB211032 2015BAB217014)
关键词 词项重要性 词项图 检索模型 TI-IDF term importance graph-of-word retrieval model TI-IDF
  • 相关文献

参考文献2

二级参考文献22

  • 1左家莉,王明文,王希.基于Markov网络的信息检索扩展模型[J].清华大学学报(自然科学版),2005,45(S1):1847-1852. 被引量:9
  • 2曹瑛,王明文,陶红亮.基于Markov网络的检索模型[J].山东大学学报(理学版),2006,41(3):101-105. 被引量:5
  • 3陈燕红,黄名选.基于Apriori改进算法的局部反馈查询扩展[J].现代图书情报技术,2007(9):84-87. 被引量:3
  • 4C Lioma, B Larsen, W Lu. Rhetorical relations for in- formation retrieval[C]//Proceedings of the 35th annu- al international ACM SIGIR conference on research and development in information retrieval, 2012. 931-940.
  • 5D Metzler, W B Croft. Latent Concept Expansion U- sing Markov Random Fields[C]//Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007 . 311-318.
  • 6Yuan Lin, Hongfei Lin,Song Jin. Social Annotation in Query Expansion.- a Machine Learning Approach[C]// Proceedings of the 34th annual international ACM SI- GIR conference on research and development in infor- mation retrieval, 2011. 405-414.
  • 7Fonseca B M, Golgher P B, Moura E S de. Discove- ring Search Engine Related Query Using Association Rules [J]. Journal of Web Engineering, 2004, 2(4). 215-227.
  • 8Dai Jiahong. Fuzzy cluster-based query expansion [-D. Master Thesis, Department of Information Man- agement, National Sun Yat-sen University, Taiwan, 2004.
  • 9黄萱菁,张奇,邱锡鹏.现代信息检索[M].第一版.机械工业出版社,2012.
  • 10Zhai C,Lafferty J.Model-based feedback in the language modeling approach to information retrieval[C]//Proceedings of the tenth international conference on Information and knowledge management.ACM,2001:403-410.

共引文献9

同被引文献59

引证文献11

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部