The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource...The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.展开更多
AAC算法(Advanced AC)是使用最为广泛的多模式串匹配算法,匹配性能高,匹配时间稳定。针对AAC算法为判定转移目标状态是否为终结状态,在匹配时每读入一个字符都要访问output表,代价较高的问题,通过两种方法改进了AAC算法。第一种方法为...AAC算法(Advanced AC)是使用最为广泛的多模式串匹配算法,匹配性能高,匹配时间稳定。针对AAC算法为判定转移目标状态是否为终结状态,在匹配时每读入一个字符都要访问output表,代价较高的问题,通过两种方法改进了AAC算法。第一种方法为拷贝自动机中的终结状态,将其附加在AAC自动机后,并将原自动机中指向终结状态的转移目标修改为附加状态,直接根据转移目标位置判断当前状态是否是终结状态,从而提出Advanced AC with Additive state(AACA)算法。第二种改进方法为将自动机中指向终结状态的状态转移值置为负数,根据转移目标的值直接判断目标状态是否为终结状态,从而提出Advanced AC with Negative state(AACN)算法。以上两种改进算法只有在发现模式匹配时才需进行output表的访问。实验数据表明:AACA和AACN算法性能均高于AAC算法,特别在中小规模匹配上,性能提升更为明显。展开更多
基金supported in part by National Natural Science Foundation of China(61671078)the Director Funds of Beijing Key Laboratory of Network System Architecture and Convergence(2017BKL-NSACZJ-06)
文摘The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.
文摘AAC算法(Advanced AC)是使用最为广泛的多模式串匹配算法,匹配性能高,匹配时间稳定。针对AAC算法为判定转移目标状态是否为终结状态,在匹配时每读入一个字符都要访问output表,代价较高的问题,通过两种方法改进了AAC算法。第一种方法为拷贝自动机中的终结状态,将其附加在AAC自动机后,并将原自动机中指向终结状态的转移目标修改为附加状态,直接根据转移目标位置判断当前状态是否是终结状态,从而提出Advanced AC with Additive state(AACA)算法。第二种改进方法为将自动机中指向终结状态的状态转移值置为负数,根据转移目标的值直接判断目标状态是否为终结状态,从而提出Advanced AC with Negative state(AACN)算法。以上两种改进算法只有在发现模式匹配时才需进行output表的访问。实验数据表明:AACA和AACN算法性能均高于AAC算法,特别在中小规模匹配上,性能提升更为明显。