摘要
微博搜索主要是计算文档与查询词之间的相关性,通过统计方法确定词量的权重,再用向量空间模型计算相关度.然而使用词量搜索方法,搜索精度并不高,检测到某条微博的信息含量有限,难以保证用户查询的关注度.针对这一问题,提出基于动态步长的微博搜索排序算法.该算法的主要实现过程:首先对微博已有的特征进行分析,然后用信息熵的方法计算微博信息含量,不使用词量为计算单位,而以词性为单位计算微博的相关度.最后把动态步长加入到List Net排序算法中,并用Armijo-Goldstein准则对步长进行优化.通过仿真实验表明,本算法排序效果更优.
Microblog search is mainly calculation the relevance between the document and query,these weight of words are determined by the statistical method,and the relevance degree is calculated by vector space model. However,searching by words is not enough accuracy,the information content of microblog unit detection through this method is limited,thus inadequate to show the true attention paid by users in their query. Aiming to this problem,we proposed a sort algorithm for microblog search based on dynamic stepsize. The main process of algorithm: firstly,the existing features of microblog were analyzed. Secondly,the information content of microblog were calculated by using information entropy method,words were not as the calculating unit,but calculation the relevance of microblog based on part of speech. Finally,the dynamic stepsize was introduced to the List Net sort algorithm,and it was optimized by Armijo-Goldstein principle. The simulation experiment results show that the algorithm sort effect is better.
出处
《湖北大学学报(自然科学版)》
CAS
2016年第3期258-266,共9页
Journal of Hubei University:Natural Science
基金
国家自然科学基金(61202248)资助