摘要
WebLog访问序列模式挖掘将数据挖掘中的序列模式技术应用于Web服务器上的日志文件,以此来改善Web的信息服务,而在对海量的数据挖掘时,系统资源开销很大。该文结合SPAM、PrefixSpan的思想,提出一个新的算法——SPAM-FPT,该算法通过建立First_Positon_Table,避免了SPAM中的"与操作"、"连接操作"以及PrefixSpan中大量的"投影数据库"的建立,可以快捷地挖掘数据库中所有"频繁子序列"。
WebLog mining is application of sequential pattern mining of data mining technology on Web server log files. Sequential patterns mined from Web logs are used to improve the quality of information service on Web. The main challenge of mining access sequential pattern form WebLog is the high processing cost due to the large amount of data. By combining SPAM and PrefixSpan, this paper proposes a new arithmetic SPAM-FPT. By constructing first_positon_table, SPAM-fPT avoids "joining" or "ANDing" in SPAM and generating a large number of projected database in PrefixSpan, and gets all the frequent sequential patterns form dtatabase.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第17期80-82,共3页
Computer Engineering