摘要
数据挖掘算法过程中对客户行为的实时性是分析客户网络消费行为的重要要素之一,但是Prefixspan数据挖掘算法挖掘过程中并未对此问题予以考虑,因此,在时间间隔序列模式概念的基础上,提出了一种基于时间间隔和点击量的Prefixspan改进算法。在该算法中,引入了频繁度和时间属性的概念,并加入了时间间隔和点击量等要素,从而使挖掘到的信息具有实时性的特点,并且提高了对挖掘对象的侧重性。通过实验验证,与原来的Prefixspan算法相比较后表明,改进算法用于具有时间特性的数据集时获得的挖掘结果更精确,挖掘效率得到了有效的提高。
The real-time character of customer behavior is one of the main factors for analyzing customer's internet consumption behavior. But it was ignored in the data mining algorithm of Prefixspan, so based on the concept of time interval sequence pattern, an improved algorithm integrated with time interval and click quantity was presented. In this algorithm,the concept of the frequent degree and time attribute was imported and the factors of time interval and click quantity was added, which made the mined dates had the real-time charac- ter, and improved the emphasis on sex of the mining object. The experiment shown that compared with the original algorithm, the improved algorithm was more precise,when used to mine the data set with real-time character,at the same time the mining efficiency has been improved effectively.
出处
《计算机技术与发展》
2011年第10期81-84,共4页
Computer Technology and Development
基金
山西省自然科学基金资助项目(2009011022-1)
关键词
时间间隔
点击率
序列模式
数据挖掘
time interval
click quantity
sequence patterns
data mining