摘要
了解用户的行为与特征对网站的设计与维护非常重要,分析网站日志可有效获取Web的访问信息.提出一种利用对网站日志文件的分析实现Web用户分类的新算法.首先对日志文件进行数据筛选及事务识别,构建出一个链接网络图;然后利用页面之间的链接关系定义评估页面之间相似性的新指标,合并相似的页面,形成一个压缩链接图;使用FDOD对链接图中的链接路径进行分类.实验表明,该分类算法具有易于使用、响应迅速以及准确性高等优点。
Behavior and characterization of users is an important issue in the design and maintenance of websites. Analysis of the Web access logs can offer deep comprehension to the Web usage and facilitate the Web personalization. This paper presents a method for Web user categorization from Web log files. A link graph is constructed firstly after the data cleaning and transaction identification with original Web log data. The similarity between two pages in the link graph are defined which is based on the page link information and by merging the similar pages into a page class the link graph is compressed. Finally we use the FDOD, a measure of discrepancy between ordered sequences, to categorize the link paths into several classes. The experiments on an actual Web site log data are tested and the results indicate that the approach proposed have advantages of easy-to-use, fast-response and good accuracy.
出处
《河南科学》
2007年第1期112-117,共6页
Henan Science
基金
Natural Science Foundation of the Education Department of Henan Province,China(2006520022)
Youth Foundation of Zhoukou Normal University(ZKNUQN200615)