摘要
通常搜索引擎网站都有存储大量远程站点复制网页的数据库.为保持复制网页和源网页的同步,需要花费大量的时间和资源.本文提出了保持复制网页和源网页一致的多种同步新策略,并提出源端网页变化的泊松模型,给出了刷新率和刷新时长的规范性描述,对各种同步策略的性能进行了研究和比较分析,发现其较大地改善了网页数据库刷新率.
There are plenty of local copies of pages of remote web sites on local databases on most of web search engine sites. It is necessary to pull remote web pages periodically to refresh local copies of these pages on database in order to keep copies and source pages consistent, and which takes plenty of time and resources. The article proposes serveral policies to synchronize copy and source pages , proposes a Poission model of source page change, define freshness and fresh time, studies on these policies and compares their effectiveness. It is shown that the proposed policies improve the freshness of web pages silanificantly.
出处
《华东师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2006年第1期108-115,共8页
Journal of East China Normal University(Natural Science)
关键词
同步技术
网页
数据库
刷新
搜索引擎
synchronization technology
web page
database
refresh
search engine