摘要
从 web日志中发现有用的信息是所有 web站点管理者的迫切愿望 ,但 web服务器日志的不准确导致数据准备阶段的复杂性 .在数据挖掘以往的应用领域如 POS数据库中 ,存在着具有自然特征的事务 ,而在 web日志中不但没有这种事务 ,而且还不容易通过分析得到这种事务 .本文首先描述了引用长度事务分割方法的用户浏览行为模型 ,然后针对这种模型提出了两点改进 :增加了网络延时参数和对噪音数据处理的考虑 .改进后的模型能适应网络延时较大且随时间变化的情况 。
Web based organizations often generate large volumes of data in their daily operations. Analyzing such data can help these organizations to design marketing strategies, targeting customers in electronic commerce, improve system design and enhance server performance. But the inaccuracy of the Web log leads to the complexity of data preparation phase of Web usage mining. While traditional domains for data mining have naturally defined transactions, there is no convenient method of clustering web references into transactions. After describing a User Browsing Behavior model used in reference length transaction identification approach, the paper presents two improvements on this model: adding a parameter called network delay and introducing the disposal of noise data. The algorithm after improvement can acquire users browsing pattern more exactly.
出处
《小型微型计算机系统》
CSCD
北大核心
2002年第1期116-118,共3页
Journal of Chinese Computer Systems