基于分类的term重要性识别方法

Term Importance Identification Method Based on Classification

下载PDF

导出

摘要在传统的搜索引擎和信息检索中,用户Query中的term-weight通常是以一种上下文无关的方式得到的。现有的大多数信息检索技术都使用词袋方法,例如布尔模型、向量空间模型和概率模型等,这些方法均没有考虑Query中term之间的相关性。为了能够充分利用Query中的信息来提高term-weight的准确度,提出了一种有监督的机器学习方法来学习用户Query中的term-weight。该方法基于分类的方法,并引入了句法分析作为分类的一项重要的特征来训练模型。考虑用户Query中term之间的关系后,既避免了由Query到单个term的信息丢失,又增加了短文本的特征,同时使分类器实现软输出,能够给term的重要程度一个更为准确的量化值。 In the field of traditional search engines and information retrieval, term weights for the input query are typi- cally derived in a context independent fashion. Most information retrieval techniques employ bag-of-words approaches like Boolean models, vector-space models and other probabilistic ranking approaches to obtain term-weight of a term in a query. However, all these algorithms treat terms independently, and do not take the relationship among the terms. This paper employed supervised machine learning based on classification and syntactic parsing to derive a context-sensitive and query-dependent term weight for each word in a search query. By taking the result of syntactic parsing as a major feature of the classification, it is now able to avoid the information loss and increase the features of the short text. Mean- while the classifier could achieve soft output, in order to give a more accurate quantized value to term importance.

作者邱云飞鲍莉邵良杉

机构地区辽宁工程技术大学软件学院辽宁工程技术大学系统工程研究所

出处《计算机科学》 CSCD 北大核心 2013年第11期242-247,共6页 Computer Science

基金国家自然科学基金(70971059) 辽宁省创新团队项目(2009T045)资助

关键词分类依存句法分析查询词权重查询分析 term重要性搜索引擎信息检索 Classification, Dependency parsing, Term-weight, Query analysis, Term importance, Search engine, Information retrieval

分类号 TP311.1 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1第30次中国互联网发展状况统计报告[R].中国互联网络信息中心(CNNIC),2012.
2Guo Jia-feng, Xu Gu, Chen Xue-qi, et al. Named entity recogni- tion in query[C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston, MA, USA: ACM, 2009: 267-274.
3Fonseea B M,Golgher P, Possas B, et al. Concept-based interae-tive query expansion[C]//Proceedings of the 14th ACM interna- tional conference on information and knowledge management. New York, NY, USA: ACM, 2005 : 696-703.
4Cao G, Nie J Y, Gao J, et al. Selecting good expansion terms for pseudo-relevance feedback[C]//Proeeedings of the 31st annual international ACM SIGIR conference on research and develop- ment in information retrieval. New York, NY, USA: ACM, 2008 : 243-250.
5Gao J, Nie J Y, Xun E, et al. Improving query translation for cross-language information retrieval using statistical rrlels[C]// Proceedings of the 24th annual international ACM SIGIR con- ference on research and development in irfformation retrieval. New York, NY, USA: ACM, 2001 : 96-104.
6CaIlan J P, Croft W B, Broglio J. Tree and tipster experiments with inquery [C]//Information Processing and Management: an International Journal-Special issue:the second text retrieval con- ference(TR[C-2). 1995 : 327-343.
7Allan J, Callan J, Croft W B, et al. Inquery at trec-5 [C]// TR[C. 1997 ; 119-132.
8Bendersky M, Croft W B. Discovering key concepts in verbosequeries[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2008 : 491-498.
9Kumaran G, Allan J. Effective and efficient user interaction for long queries[C] // Proceedings of the 31st annual international ACM SIGIR conference on Research and development in infor- mation retrieval. New York, NY, USA: ACM, 2008 : 11-18.
10Kumaran G, Carvalho V R. Reducing long queries using query quality predictors[C]//Proceedings of the 32nd annual interna- tional ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2009: 564- 571.

1俞婷,郑轲.基于本体的课程资源语义检索系统[J].科技与创新,2016(19):35-36.
2宋天勇,赵辉,郑山红,王国春.基于查询-概念的用户兴趣模型构建[J].吉林大学学报（信息科学版）,2015,33(3):310-314.
3李元韬,曹志宇,李敬文.基于权重编辑距离的XML查询[J].兰州交通大学学报,2010,29(3):108-111. 被引量：1
4黄名选,严小卫,张师超.查询扩展技术进展与展望[J].计算机应用与软件,2007,24(11):1-4. 被引量：53
5徐建民,陈振亚,崔琰.基于用户兴趣及术语间关系的查询扩展方法[J].山东大学学报（理学版）,2011,46(5):49-53. 被引量：1
6张映海,张宇薇.基于查询扩展词条加权的文本检索研究[J].计算机工程与科学,2011,33(1):161-165. 被引量：3
7毕洪波,王秀芳,路敬祎,张光华.Turbo码译码的一种新算法[J].大庆石油学院学报,2004,28(2):78-80.
8李伟,周金荣.RS乘积码译码算法研究[J].舰船电子工程,2010,30(4):80-83.
9Imen Nasr,Leila Najjar Atallah,Sofiane Cherif,YANG Jianxiao,WANG Kunlun.On a Hybrid Preamble/Soft-Output Demapper Approach for Time Synchronization for IEEE 802.15.6 Narrowband WBAN[J].China Communications,2015,12(2):1-10.
10范忠亮,王永生,许家栋,姚如贵.非确定性MIMO系统的K-Best检测算法研究[J].计算机仿真,2011,28(4):103-106.

计算机科学

2013年第11期

浏览历史

内容加载中请稍等...

基于分类的term重要性识别方法

参考文献12

相关作者

相关机构

相关主题

浏览历史