摘要
The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.
The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.
基金
supported in part by National Natural Science Foundation of China(61671078)
the Director Funds of Beijing Key Laboratory of Network System Architecture and Convergence(2017BKL-NSACZJ-06)