With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
Cellular networks are overloaded due to the mobile traffic surge,and mobile social networks(MSNets) can be leveraged for traffic offloading.In this paper,we study the issue of choosing seed users for maximizing the mo...Cellular networks are overloaded due to the mobile traffic surge,and mobile social networks(MSNets) can be leveraged for traffic offloading.In this paper,we study the issue of choosing seed users for maximizing the mobile traffic offloaded from cellular networks.We introduce a gossip-style social cascade(GSC) model to model the epidemic-like information diffusion process in MSNets.For static-case and mobile-case networks,we establish an equivalent view and a temporal mapping of the information diffusion process,respectively.We further prove the submodularity in the information diffusion and propose a greedy algorithm to choose the seed users for traffic offloading,yielding a sub-optimal solution to the NP-hard traffic offloading maximization(TOM) problem.Experiments are carried out to study the offloading performance,illustrating that the greedy algorithm significantly outperforms the heuristic and random algorithms,and user mobility can help further reduce cellular load.展开更多
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金supported by the National Basic Research Program of China(973 Program) through grant 2012CB316004the Doctoral Program of Higher Education(SRFDP)+1 种基金Research Grants Council Earmarked Research Grants(RGC ERG) Joint Research Scheme through Specialized Research Fund 20133402140001National Natural Science Foundation of China through grant 61379003
文摘Cellular networks are overloaded due to the mobile traffic surge,and mobile social networks(MSNets) can be leveraged for traffic offloading.In this paper,we study the issue of choosing seed users for maximizing the mobile traffic offloaded from cellular networks.We introduce a gossip-style social cascade(GSC) model to model the epidemic-like information diffusion process in MSNets.For static-case and mobile-case networks,we establish an equivalent view and a temporal mapping of the information diffusion process,respectively.We further prove the submodularity in the information diffusion and propose a greedy algorithm to choose the seed users for traffic offloading,yielding a sub-optimal solution to the NP-hard traffic offloading maximization(TOM) problem.Experiments are carried out to study the offloading performance,illustrating that the greedy algorithm significantly outperforms the heuristic and random algorithms,and user mobility can help further reduce cellular load.