Identifying bottlenecks and analyzing their characteristics are important tasks to city traffic management authorities.Although the speed difference was proposed for the bottleneck identification in the existing resea...Identifying bottlenecks and analyzing their characteristics are important tasks to city traffic management authorities.Although the speed difference was proposed for the bottleneck identification in the existing research,the use of a secondary indicator has not been fully discussed.This paper strived to develop a method to identify the bottleneck on expressways by using the massive floating car data(FCD)in Beijing.First,the speed characteristics of bottlenecks on expressway were analyzed based on the speed contour map.The results indicated that there was a significant difference between speeds on the bottleneck and downstream links when a bottleneck was observed.The speed difference could indeed be used as the primary indicator to identify the bottleneck.However,it was also shown that a sufficiently large speed difference does not necessitate an activation of a bottleneck.The speed-at-capacity was then used as the secondary indicator to distinguish the real bottleneck from the non-bottleneck speed difference.Second,a practical method for identifying the bottleneck on expressways was developed based on the speed difference and the speed-at-capacity.Finally,the method was applied to identifying the bottlenecks of the 3rd Outer Ring Expressway in Beijing.The duration,affected distance,delay and cause were used to evaluate and analyze the bottlenecks.展开更多
Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck ...Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.展开更多
The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this ...The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.展开更多
基金Project(2018YJS081)supported by the Fundamental Research Funds for the Central Universities,ChinaProjects(71273024,51578052)supported by the National Natural Science Foundation of China(NSFC)
文摘Identifying bottlenecks and analyzing their characteristics are important tasks to city traffic management authorities.Although the speed difference was proposed for the bottleneck identification in the existing research,the use of a secondary indicator has not been fully discussed.This paper strived to develop a method to identify the bottleneck on expressways by using the massive floating car data(FCD)in Beijing.First,the speed characteristics of bottlenecks on expressway were analyzed based on the speed contour map.The results indicated that there was a significant difference between speeds on the bottleneck and downstream links when a bottleneck was observed.The speed difference could indeed be used as the primary indicator to identify the bottleneck.However,it was also shown that a sufficiently large speed difference does not necessitate an activation of a bottleneck.The speed-at-capacity was then used as the secondary indicator to distinguish the real bottleneck from the non-bottleneck speed difference.Second,a practical method for identifying the bottleneck on expressways was developed based on the speed difference and the speed-at-capacity.Finally,the method was applied to identifying the bottlenecks of the 3rd Outer Ring Expressway in Beijing.The duration,affected distance,delay and cause were used to evaluate and analyze the bottlenecks.
基金supported by China MOST project (No.2012BAH46B04)
文摘Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.
基金supported by the National High Technology Research and Development Program of China(Grant No.2012AA012701)
文摘The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.