摘要
在传统的搜索引擎和信息检索中,用户Query中的term-weight通常是以一种上下文无关的方式得到的。现有的大多数信息检索技术都使用词袋方法,例如布尔模型、向量空间模型和概率模型等,这些方法均没有考虑Query中term之间的相关性。为了能够充分利用Query中的信息来提高term-weight的准确度,提出了一种有监督的机器学习方法来学习用户Query中的term-weight。该方法基于分类的方法,并引入了句法分析作为分类的一项重要的特征来训练模型。考虑用户Query中term之间的关系后,既避免了由Query到单个term的信息丢失,又增加了短文本的特征,同时使分类器实现软输出,能够给term的重要程度一个更为准确的量化值。
In the field of traditional search engines and information retrieval, term weights for the input query are typi- cally derived in a context independent fashion. Most information retrieval techniques employ bag-of-words approaches like Boolean models, vector-space models and other probabilistic ranking approaches to obtain term-weight of a term in a query. However, all these algorithms treat terms independently, and do not take the relationship among the terms. This paper employed supervised machine learning based on classification and syntactic parsing to derive a context-sensitive and query-dependent term weight for each word in a search query. By taking the result of syntactic parsing as a major feature of the classification, it is now able to avoid the information loss and increase the features of the short text. Mean- while the classifier could achieve soft output, in order to give a more accurate quantized value to term importance.
出处
《计算机科学》
CSCD
北大核心
2013年第11期242-247,共6页
Computer Science
基金
国家自然科学基金(70971059)
辽宁省创新团队项目(2009T045)资助
关键词
分类
依存句法分析
查询词权重
查询分析
term重要性
搜索引擎
信息检索
Classification, Dependency parsing, Term-weight, Query analysis, Term importance, Search engine, Information retrieval