摘要
为了高效地从海量物流数据中获取频繁路径,根据物流网络及物流的特征设计了一种物流数据模型以及一种充分考虑了物流网络拓扑信息的频繁路径序列挖掘算法PMWTI(Path Mining With Topology Information)。在PMWTI中设计了一种用于候选路径序列深度剪枝的代价容忍度剪枝方法,该方法在利用Apriori性质剪枝的基础上进一步去除了部分不可能是频繁路径序列的候选路径序列,这在一定程度上缩减了候选路径序列规模,从而减少了对数据集的扫描。实验表明,相比没有采用该剪枝方法的同等算法,PMWTI具有更高的频繁路径挖掘效率。
In order to get frequent paths from massive logistics data,according to the feature of logistics networks and logistics,this paper provided a logistics data model and a frequent path sequence mining algorithm PMWTI(Path Mining With Topology Information)taking the topological information of logistics networks into consideration.In PMWTI,a cost tolerable degree pruning method used for the deep pruning of candidate path sequences was designed.This method discards some candidate path sequences which are gained by Apriori pruning method but can't be frequent path sequences.It can downscale the candidate path sequences,so that the algorithm scans the dataset less.The experimental result shows that,compared with the same algorithm which do not adopt this pruning method,PMWTI has better mining efficiency.
出处
《计算机科学》
CSCD
北大核心
2015年第4期258-262,共5页
Computer Science
基金
国家自然科学基金项目(61363027)
广西自然科学基金项目(2012GXNSFAA053225)资助
关键词
物流
频繁路径
序列模式
数据挖掘
Logistics
Frequent path
Sequence pattern
Data mining