摘要
在个性化新闻推荐系统中,文章去重是一个重要的模块,避免了同一篇文章被重复推荐的现象。在海量用户场景下,采用传统的基于队列的去重方法将会消耗大量的内存。Bloom Filter是一种空间效率很高的随机数据结构,适用于允许有一定误判率的场景。本文基于Bloom Filter,设计双Bloom Filter位数组结构和Bloom Filter位数组链结构。实验证明,基于Bloom Filter位数组链的去重方法,不仅大大降低了程序对服务器内存要求,而且具有较好的灵活性和扩展性。
In personalization news recommendation system,duplicated news deletion is an important part,which prevents the same news from being repeatedly recommended to users.Facing a large amount of users,the traditional duplicated news deletion method will consume a great deal of memory.Bloom Filter is a random data structure with high space efficiency and is used in the situations which allows false positive rate.In this paper,based on the bloom filter,we successively designed the double bit vector structure and the bit vector list structure for duplicated news deletion.The experimental results show that,with the benefit of the bit vector list structure,it not only greatly reduce the memory requirements,but also has better flexibility and expansibility.
出处
《计算技术与自动化》
2016年第1期95-100,共6页
Computing Technology and Automation
基金
十二五国家重大专项子课题项目(2011ZX05020-007-007)