期刊文献+

基于分类的term重要性识别方法

Term Importance Identification Method Based on Classification
下载PDF
导出
摘要 在传统的搜索引擎和信息检索中,用户Query中的term-weight通常是以一种上下文无关的方式得到的。现有的大多数信息检索技术都使用词袋方法,例如布尔模型、向量空间模型和概率模型等,这些方法均没有考虑Query中term之间的相关性。为了能够充分利用Query中的信息来提高term-weight的准确度,提出了一种有监督的机器学习方法来学习用户Query中的term-weight。该方法基于分类的方法,并引入了句法分析作为分类的一项重要的特征来训练模型。考虑用户Query中term之间的关系后,既避免了由Query到单个term的信息丢失,又增加了短文本的特征,同时使分类器实现软输出,能够给term的重要程度一个更为准确的量化值。 In the field of traditional search engines and information retrieval, term weights for the input query are typi- cally derived in a context independent fashion. Most information retrieval techniques employ bag-of-words approaches like Boolean models, vector-space models and other probabilistic ranking approaches to obtain term-weight of a term in a query. However, all these algorithms treat terms independently, and do not take the relationship among the terms. This paper employed supervised machine learning based on classification and syntactic parsing to derive a context-sensitive and query-dependent term weight for each word in a search query. By taking the result of syntactic parsing as a major feature of the classification, it is now able to avoid the information loss and increase the features of the short text. Mean- while the classifier could achieve soft output, in order to give a more accurate quantized value to term importance.
出处 《计算机科学》 CSCD 北大核心 2013年第11期242-247,共6页 Computer Science
基金 国家自然科学基金(70971059) 辽宁省创新团队项目(2009T045)资助
关键词 分类 依存句法分析 查询词权重 查询分析 term重要性 搜索引擎 信息检索 Classification, Dependency parsing, Term-weight, Query analysis, Term importance, Search engine, Information retrieval
  • 相关文献

参考文献12

  • 1第30次中国互联网发展状况统计报告[R].中国互联网络信息中心(CNNIC),2012.
  • 2Guo Jia-feng, Xu Gu, Chen Xue-qi, et al. Named entity recogni- tion in query[C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston, MA, USA: ACM, 2009: 267-274.
  • 3Fonseea B M,Golgher P, Possas B, et al. Concept-based interae-tive query expansion[C]//Proceedings of the 14th ACM interna- tional conference on information and knowledge management. New York, NY, USA: ACM, 2005 : 696-703.
  • 4Cao G, Nie J Y, Gao J, et al. Selecting good expansion terms for pseudo-relevance feedback[C]//Proeeedings of the 31st annual international ACM SIGIR conference on research and develop- ment in information retrieval. New York, NY, USA: ACM, 2008 : 243-250.
  • 5Gao J, Nie J Y, Xun E, et al. Improving query translation for cross-language information retrieval using statistical rrlels[C]// Proceedings of the 24th annual international ACM SIGIR con- ference on research and development in irfformation retrieval. New York, NY, USA: ACM, 2001 : 96-104.
  • 6CaIlan J P, Croft W B, Broglio J. Tree and tipster experiments with inquery [C]//Information Processing and Management: an International Journal-Special issue:the second text retrieval con- ference(TR[C-2). 1995 : 327-343.
  • 7Allan J, Callan J, Croft W B, et al. Inquery at trec-5 [C]// TR[C. 1997 ; 119-132.
  • 8Bendersky M, Croft W B. Discovering key concepts in verbosequeries[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2008 : 491-498.
  • 9Kumaran G, Allan J. Effective and efficient user interaction for long queries[C] // Proceedings of the 31st annual international ACM SIGIR conference on Research and development in infor- mation retrieval. New York, NY, USA: ACM, 2008 : 11-18.
  • 10Kumaran G, Carvalho V R. Reducing long queries using query quality predictors[C]//Proceedings of the 32nd annual interna- tional ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2009: 564- 571.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部