摘要
稀疏性问题是协作过滤算法应用中的一个突出问题,当系统中用户对资源的评分数据集很稀疏的条件下,算法的精度和覆盖率会显著降低。针对这一问题,该文通过分析影响基于资源的协作过滤算法中的相似性计算的因素,提出采用"资源关系密度"作为描述协作过滤评分矩阵的一个特征指标,分析并总结了"资源关系密度"对典型的基于资源的协作过滤算法的影响,进而提出一种虚拟用户填充算法。实验结果表明,虚拟用户填充法能够有效改善典型的基于资源的协作过滤算法在稀疏数据集上的精度和覆盖率。
This paper presents an algorithm to improve the performance of item-based collaborative filtering algorithms working with sparse data sets. The factors impacting the correlation calculation in item-based collaborative filtering algorithms were analyzed to develop an item relationship density as an important characteristic for describing the rating matrix, the effect of the item relationship density on item-based collaborative filtering is then illustrated. The item relation density is then used to develop a virtual user filling algorithm. That effectively improves the precision and coverage of item-based algorithms with sparse datasets. Thus the item relation density is a key characteristic factor for rating matrices.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第10期1725-1728,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家"八六三"高技术项目(2006AA010101)
国家"十一五"科技支撑计划资助项目(2006BAH02A12)
关键词
协作过滤
稀疏性问题
资源关系密度
虚拟用户填充
collaborative filtering
sparse problem
item relation density
virtual user filling