There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detec...There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detection model for big data based on nondestructive partitioning and balanced allocation.A data allocation strategy based on capacity and workload is introduced to achieve local load balance,and a dynamic load adjustment strategy is adopted to maintain global load balance of cluster.Moreover,data integrity is protected by using session reassemble and session partitioning.The simulation results show that the new model enjoys favorable advantages such as good load balance,higher detection rate and detection efficiency.展开更多
In the big data platform,because of the large amount of data,the problem of load imbalance is prominent.Most of the current load balancing methods have problems such as high data flow loss rate and long response time;...In the big data platform,because of the large amount of data,the problem of load imbalance is prominent.Most of the current load balancing methods have problems such as high data flow loss rate and long response time;therefore,more effective load balancing method is urgently needed.Taking HBase as the research subject,the study analyzed the dynamic load balancing method of data flow.First,the HBase platform was introduced briefly,and then the dynamic load-balancing algorithm was designed.The data flow was divided into blocks,and then the load of nodes was predicted based on the grey prediction GM(1,1)model.Finally,the load was migrated through the dynamic adjustable method to achieve load balancing.The experimental results showed that the accuracy of the method for load prediction was high,the average error percentage was 0.93%,and the average response time was short;under 3000 tasks,the response time of the method designed in this study was 14.17%shorter than that of the method combining TV white space(TVWS)and long-term evolution(LTE);the average flow of nodes with the largest load was also smaller,and the data flow loss rate was basically 0%.The experimental results show the effectiveness of the proposed method,which can be further promoted and applied in practice.展开更多
To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while r...To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.展开更多
As a fundamental operation in LBS(location-based services),the trajectory similarity of moving objects has been extensively studied in recent years.However,due to the increasing volume of moving object trajectories an...As a fundamental operation in LBS(location-based services),the trajectory similarity of moving objects has been extensively studied in recent years.However,due to the increasing volume of moving object trajectories and the demand of interactive query performance,the trajectory similarity queries are now required to be processed on massive datasets in a real-time manner.Existing work has proposed distributed or parallel solutions to enable large-scale trajectory similarity processing.However,those techniques cannot be directly adapted to the real-time scenario as it is likely to generate poor balancing performance when workload variance occurs on the incoming trajectory stream.In this paper,we propose a new workload partitioning framework,ART(Adaptive Framework for Real-Time Trajectory Similarity),which introduces practical algorithms to support dynamic workload assignment for RTTS(real-time trajectory similarity).Our proposal includes a processing model tailored for the RTTS scenario,a load balancing framework to maximize throughput,and an adaptive data partition manner designed to cut off unnecessary network cost.Based on this,our model can handle the large-scale trajectory similarity in an on-line scenario,which achieves scalability,effectiveness,and efficiency by a single shot.Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposal and prove the huge advantage of our approach over state-of-the-art solutions in the literature.展开更多
排序算法是计算机科学领域的一个基础算法,是大量应用的算法核心。在大数据时代,随着数据量的极速增长,并行排序算法受到广泛关注。现有的并行排序算法普遍存在通信开销过大、负载不均衡等问题,导致算法难以大规模扩展。针对以上问题,...排序算法是计算机科学领域的一个基础算法,是大量应用的算法核心。在大数据时代,随着数据量的极速增长,并行排序算法受到广泛关注。现有的并行排序算法普遍存在通信开销过大、负载不均衡等问题,导致算法难以大规模扩展。针对以上问题,提出一种大规模可扩展的正则采样并行排序(scalable parallel sorting by regular sampling,ScaPSRS)算法,摒弃传统正则采样并行排序(parallel sorting by regular sampling,PSRS)算法中由一个进程负责采样的做法,转而让所有进程参与正则采样,选出p-1个分隔元素,将整个数据集划分成p个不相交的子集,然后实施并行排序,避免了单一进程的采样瓶颈。此外,ScaPSRS采用一种新的迭代更新策略选择p-1个分隔元素,保证划分的p个子集尽可能大小相同,从而确保p个进程对各自的子集进行本地排序时的负载均衡。在天河二号超级计算机上进行的大量实验表明,ScaPSRS算法能够成功地扩展到32000个内核,性能比PSRS算法和Hofmann等人提出的分区算法分别提升了3.7倍和11.7倍。展开更多
文摘There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detection model for big data based on nondestructive partitioning and balanced allocation.A data allocation strategy based on capacity and workload is introduced to achieve local load balance,and a dynamic load adjustment strategy is adopted to maintain global load balance of cluster.Moreover,data integrity is protected by using session reassemble and session partitioning.The simulation results show that the new model enjoys favorable advantages such as good load balance,higher detection rate and detection efficiency.
文摘In the big data platform,because of the large amount of data,the problem of load imbalance is prominent.Most of the current load balancing methods have problems such as high data flow loss rate and long response time;therefore,more effective load balancing method is urgently needed.Taking HBase as the research subject,the study analyzed the dynamic load balancing method of data flow.First,the HBase platform was introduced briefly,and then the dynamic load-balancing algorithm was designed.The data flow was divided into blocks,and then the load of nodes was predicted based on the grey prediction GM(1,1)model.Finally,the load was migrated through the dynamic adjustable method to achieve load balancing.The experimental results showed that the accuracy of the method for load prediction was high,the average error percentage was 0.93%,and the average response time was short;under 3000 tasks,the response time of the method designed in this study was 14.17%shorter than that of the method combining TV white space(TVWS)and long-term evolution(LTE);the average flow of nodes with the largest load was also smaller,and the data flow loss rate was basically 0%.The experimental results show the effectiveness of the proposed method,which can be further promoted and applied in practice.
基金The National Key Basic Research Program of China(973 Program)
文摘To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.
基金the National Natural Science Foundation of China under Grant Nos.61802273,61772356,and 61836007the Postdoctoral Science Foundation of China under Grant No.2017M621813+2 种基金the Postdoctoral Science Foundation of Jiangsu Province of China under Grant No.2018K029Cthe Natural Science Foundation for Colleges and Universities in Jiangsu Province of China under Grant No.18KJB520044the Open Program of Neusoft Corporation under Grant No.SKLSAOP1801 and Blockshine Technology Corporation of China.
文摘As a fundamental operation in LBS(location-based services),the trajectory similarity of moving objects has been extensively studied in recent years.However,due to the increasing volume of moving object trajectories and the demand of interactive query performance,the trajectory similarity queries are now required to be processed on massive datasets in a real-time manner.Existing work has proposed distributed or parallel solutions to enable large-scale trajectory similarity processing.However,those techniques cannot be directly adapted to the real-time scenario as it is likely to generate poor balancing performance when workload variance occurs on the incoming trajectory stream.In this paper,we propose a new workload partitioning framework,ART(Adaptive Framework for Real-Time Trajectory Similarity),which introduces practical algorithms to support dynamic workload assignment for RTTS(real-time trajectory similarity).Our proposal includes a processing model tailored for the RTTS scenario,a load balancing framework to maximize throughput,and an adaptive data partition manner designed to cut off unnecessary network cost.Based on this,our model can handle the large-scale trajectory similarity in an on-line scenario,which achieves scalability,effectiveness,and efficiency by a single shot.Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposal and prove the huge advantage of our approach over state-of-the-art solutions in the literature.
文摘排序算法是计算机科学领域的一个基础算法,是大量应用的算法核心。在大数据时代,随着数据量的极速增长,并行排序算法受到广泛关注。现有的并行排序算法普遍存在通信开销过大、负载不均衡等问题,导致算法难以大规模扩展。针对以上问题,提出一种大规模可扩展的正则采样并行排序(scalable parallel sorting by regular sampling,ScaPSRS)算法,摒弃传统正则采样并行排序(parallel sorting by regular sampling,PSRS)算法中由一个进程负责采样的做法,转而让所有进程参与正则采样,选出p-1个分隔元素,将整个数据集划分成p个不相交的子集,然后实施并行排序,避免了单一进程的采样瓶颈。此外,ScaPSRS采用一种新的迭代更新策略选择p-1个分隔元素,保证划分的p个子集尽可能大小相同,从而确保p个进程对各自的子集进行本地排序时的负载均衡。在天河二号超级计算机上进行的大量实验表明,ScaPSRS算法能够成功地扩展到32000个内核,性能比PSRS算法和Hofmann等人提出的分区算法分别提升了3.7倍和11.7倍。