期刊文献+

基于统计学的Web论坛增量更新策略研究 被引量:1

RESEARCH ON INCREMENTAL UPDATING STRATEGY OF WEB FORUM BASED ON STATISTICS
下载PDF
导出
摘要 传统预测网页变化的模型将一种规律应用到所有网页之上,没有考虑各页面之间的区别,针对网络论坛索引页面提出了一种基于统计学规律的增量更新策略模型。通过相关论坛版块的索引页面进行数据的采集,观察并证明其变化大致呈现以日为周期的规律性变化,一日之内的变化曲线与人们的生活规律相吻合。然后采用最小二乘法多项式曲线拟合对其进行数学建模,得到合适的数学模型,并将其应用在索引页面的增量更新之上,从而可以准确预测索引页面下一次更新的时间间隔。实验结果表明,该模型在10%误差范围内,预测的准确率为93.9%。 The traditional model of forecasting page changes applies a rule to all pages, without regard to the differences between pages. In this paper, we propose an incremental updating strategy model based on statistical rules for indexing web pages. Through the data collection and observation of the index page of the relevant forum, it is found that the index page shows a regular change in the daily cycle, and the curve of variation within a day coincides with the law of peopled life. The mathematical model is established by using the least square polynomial curve fitting, and it is applied to incremental updating of the index page, which can predict the time interval of the next updating of the index page. The experimental results show that the accuracy of the model is 93.9% within the 10% error range.
出处 《计算机应用与软件》 2017年第6期31-36,129,共7页 Computer Applications and Software
基金 国家自然科学基金项目(61402342)
关键词 增量更新 网页变化 统计学 数学建模 Incremental updating Page changes Statistics Mathematic modeling
  • 相关文献

参考文献6

二级参考文献30

  • 1吴丽辉,白硕,张刚,张凯.Web信息采集中的哈希函数比较[J].小型微型计算机系统,2006,27(4):673-676. 被引量:8
  • 2孟涛,王继民,闫宏飞.网页变化与增量搜集技术[J].软件学报,2006,17(5):1051-1067. 被引量:22
  • 3程菲,汪建海,罗键.增量更新Crawler进行Web收集方法研究[J].计算机工程与科学,2006,28(12):28-30. 被引量:2
  • 4李魁,程学旗,郭岩,张凯.WWW论坛中的动态网页采集[J].计算机工程,2007,33(6):80-82. 被引量:11
  • 5KIM S J, LEE S H. An empirical study on the change of Web pages [ C]// Proceedings of the 7th Asia-Pacific Web Conference on Web Technologies Research and Development: APWeb 2005, LNCS 3399. Heidelberg: Springer-Verlag, 2005:632-642.
  • 6北大网络实验室.Web InfoMall[EB/OL].[2008-08-11].http://www.infomall.cn/.
  • 7CHO J, GARCIA-MOLINA H. Parallel crawlers[ C]// Proceedings of the 11th International Conference on World Wide Web: WWW 2002. New York: ACM Press, 2002: 124- 135.
  • 8CHO J, GARCIA-MOLINA H. The evolution of the Web and implications for an incremental crawler[ C]//Proceedings of the 26th International Conference on Very Large Databases. San Francisco: Morgan Kaufmann Publishers, 2000:200 - 209.
  • 9FETTERLY D, MANASSE M, NAJORK M, et al. A large-scale study of the evolution of Web pages[ C]// Proceedings of the 12th International Conference on World Wide Web. New York: ACM Press, 2003:669-678.
  • 10SALTON G, BUCKLEY C. Term-weighting approaches in automatic retrieval[ J]. Information Processing and Management, 1998, 24 (5): 513-523.

共引文献19

同被引文献14

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部