摘要
网络查询分类对提高搜索引擎的搜索质量有重要的意义。该文通过对真实用户查询日志的分析和标注,发现四种特征词(称之为"VASE"特征词)对查询分类起决定性作用。我们提取特征词并构造了一个特征词倒排索引,用于对查询进行主题分类。在此基础之上,提出了基于网络扩展和加权特征词的方法改善分类的效果。实验结果显示,基于此分类方法的正确率和召回率分别达到78.2%和77.3%。
Web query classification is of great significance in improving the performance of search engine. By analyzing and manually labeling real user query logs, we found that four kinds of words, as called "VASE" characterizing words, substantially characterizing the query category. We extracted such words and made an inverted index from them for the web queriy classification. We further propose a corresponding web extension and weighted characteristic words methods to improve the classification results. Experimental results show that the precision rate and recall rate reach 78.2 % and 77.3 % respectively, meeting the practical requirements.
出处
《中文信息学报》
CSCD
北大核心
2009年第3期39-44,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60773027,60736044)
国家863计划重点资助项目(2006AA010108,2008AA01Z145)
关键词
计算机应用
中文信息处理
网络查询分类
“VASE”特征词
网络扩展
加权特征词
computer application
Chinese information processing
Web query classification
"VASE" characteristic words
Web extension
weighted words