With the explosion of network bandwidth and the ever-changing requirements for diverse network-based applications,the traditional processing architectures,i.e.,general purpose processor(GPP) and application specific...With the explosion of network bandwidth and the ever-changing requirements for diverse network-based applications,the traditional processing architectures,i.e.,general purpose processor(GPP) and application specific integrated circuits(ASIC) cannot provide sufficient flexibility and high performance at the same time.Thus,the network processor(NP) has emerged as an alternative to meet these dual demands for today's network processing.The NP combines embedded multi-threaded cores with a rich memory hierarchy that can adapt to different networking circumstances when customized by the application developers.In today's NP architectures,multithreading prevails over cache mechanism,which has achieved great success in GPP to hide memory access latencies.This paper focuses on the efficiency of the cache mechanism in an NP.Theoretical timing models of packet processing are established for evaluating cache efficiency and experiments are performed based on real-life network backbone traces.Testing results show that an improvement of nearly 70% can be gained in throughput with assistance from the cache mechanism.Accordingly,the cache mechanism is still efficient and irreplaceable in network processing,despite the existing of multithreading.展开更多
This paper describes a solution to build network-processor-based Radio Network Controller (RNC) in all-IP wireless networks, it includes the structure of the 3rd Generation (3G) wireless networks and the role of netw...This paper describes a solution to build network-processor-based Radio Network Controller (RNC) in all-IP wireless networks, it includes the structure of the 3rd Generation (3G) wireless networks and the role of network nodes, such as Base Station (BS), RNC, and Packet-Switched Core Networks (PSCN). The architecture of IXP2800 network processor; the detailed implementation of the solution on IXP2800-based RNC are also covered. This solution can provide scalable IP forward features and it will be widely used in 3G RNCs.展开更多
Today's firewalls and security gateways are required to not only block unauthorized accesses by authenticating packet headers, but also inspect flow payloads against malicious intrusions. Deep inspection emerges as a...Today's firewalls and security gateways are required to not only block unauthorized accesses by authenticating packet headers, but also inspect flow payloads against malicious intrusions. Deep inspection emerges as a seamless integration of packet classification for access control and pattern matching for intrusion prevention. The two function blocks are linked together via well-designed session lookup schemes. This paper presents an architecture-aware session lookup scheme for deep inspection on network processors (NPs). Test results show that the proposed session data structure and integration approach can achieve the OC-48 line rate (2.5 Gbps) with inline stateful content inspection on the Intel IXP2850 NP. This work provides an insight into application design and implementation on NPs and principles for performance tuning of NP-based programming such as data allocation, task partitioning, latency hiding, and thread synchronization.展开更多
High-performance network processors are expected to play an important role in future high-speed routers. This paper focuses on two representative techniques needed for high-performance network processors: hardwired lo...High-performance network processors are expected to play an important role in future high-speed routers. This paper focuses on two representative techniques needed for high-performance network processors: hardwired logic design and multithread design. Using hardwired logic, this paper compares a single-thread design with a multithread design, and proposes general models and principles to analyze the clock frequency and the resource cost for these environments. Then, two IP header processing schemes, one in single-thread mode and the other in double-thread mode, are developed using these principles and the implementation results verified the theoretical calculation.展开更多
Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. A...Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.展开更多
As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performa...As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.展开更多
Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including...Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers.展开更多
This paper reviewed two types of network storage technique: NAS and SAN. After comparing and analyzing, it concluded that the ultimate realization of network storage will be in the eventual convergence of NAS and SAN ...This paper reviewed two types of network storage technique: NAS and SAN. After comparing and analyzing, it concluded that the ultimate realization of network storage will be in the eventual convergence of NAS and SAN architectures. Currently, all the integration methods are based on the architecture level. This paper presented a device level integration scheme based on IXP1200 network processor. The device can be used as an NAS file server or an SAN’s storage node. Furthermore, it can be used as a bridge to connect NAS and SAN, and then be shared by their clients.展开更多
We cleveloped a high-speed information retrieval system. The system hased on the IXP 2800 is one of the dedicute device. The velocity of the information retrieval is 6.8 Gb/s. The protocol support Telnet, FTP, SMTP, P...We cleveloped a high-speed information retrieval system. The system hased on the IXP 2800 is one of the dedicute device. The velocity of the information retrieval is 6.8 Gb/s. The protocol support Telnet, FTP, SMTP, POP3 etc. various networks protocols. The information retrieval supports the key word and the natural language process. This paper explains the hardware system, software system and the index of the performance. Key words network processor - IXP2800 - information retrieval - IXA CLC number TP 309 Foundation item: Supported by the National Natural Science Foundation of China (69873016 & 69972017) and the National High Technology Development Program of China (863-301-06-1)Biography: SHI Shu-dong (1963-), male, Ph. D. candidate, research direction: network & information security.展开更多
Recent efforts to add new services to the wide-band code division multiple accesses (WCDMA) system have increased interest in network processor (NP)-based routers that are easy to extend and evolve. In this paper,...Recent efforts to add new services to the wide-band code division multiple accesses (WCDMA) system have increased interest in network processor (NP)-based routers that are easy to extend and evolve. In this paper, an application of NPs in routing engine module (REM) of radio network controller (RNC) in WCDMA system is proposed. The measuring results show that NPs have good performance and efficiency in routing traffic of the communication network and the simulation verifies the fast forwarding function of NPs.展开更多
This paper deals with an in-line network security processor (NSP) design that implements the Intemet Protocol Security (IPSec) protocol processing for the 10 Gbps Ethernet. The 10 Gbps high speed data transfer, th...This paper deals with an in-line network security processor (NSP) design that implements the Intemet Protocol Security (IPSec) protocol processing for the 10 Gbps Ethernet. The 10 Gbps high speed data transfer, the IPSec processing in- cluding the crypto-operation, the database query, and IPSec header processing are integrated in the design. The in-line NSP is implemented using 65 nm CMOS technology and the layout area is 2.5 mm^3 mm with 360 million gates. A configurable crossbar data transfer skeleton implementing an iSLIP scheduling algorithm is proposed, which enables simultaneous data transfer between the heterogeneous multiple cores. There are, in addition, a high speed input/output data buffering mechanism and design of high performance hardware structures for modules, wherein the transfer efficiency and the resource utilization are maximized and the IPSec protocol processing achieves 10 Gbps line speed. A high speed and low power hardware look-up method is proposed, which effectively reduces the area and power dissipation. The post simulation results demonstrate that the design gives a peak throughput for the Authentication Header (AH) transport mode of 10.06 Gbps with the average test packet length of 512 bytes under the clock rate of 250 MHz, and power dissipation less than 1 W is obtained. An FPGA prototype is constructed to verify the function of the design. A test bench is being set up for performance and function verification.展开更多
Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor...Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor network (APN), communication contention has a strong influence on the execution time of a parallel application. This paper investigates the incorporation of contention awareness into task scheduling. The innovation is the idea of dynamically scheduling edges to links, for which we use the earliest finish communication time search algorithm based on shortest-path search method. The other novel idea proposed in this paper is scheduling priority based on recursive rank computation on heterogeneous arbitrary processor network. In the end, to reduce time complexity of algorithm, a parallel algorithm is proposed and speedup O(PPE) is achieved. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm significantly surpasses classic and static communication contention awareness algorithm, especially for high data transmission rate parallel application.展开更多
基金Supported by the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList)the National High-Tech Research and Development (863) Program of China (No.2007AA01Z468)
文摘With the explosion of network bandwidth and the ever-changing requirements for diverse network-based applications,the traditional processing architectures,i.e.,general purpose processor(GPP) and application specific integrated circuits(ASIC) cannot provide sufficient flexibility and high performance at the same time.Thus,the network processor(NP) has emerged as an alternative to meet these dual demands for today's network processing.The NP combines embedded multi-threaded cores with a rich memory hierarchy that can adapt to different networking circumstances when customized by the application developers.In today's NP architectures,multithreading prevails over cache mechanism,which has achieved great success in GPP to hide memory access latencies.This paper focuses on the efficiency of the cache mechanism in an NP.Theoretical timing models of packet processing are established for evaluating cache efficiency and experiments are performed based on real-life network backbone traces.Testing results show that an improvement of nearly 70% can be gained in throughput with assistance from the cache mechanism.Accordingly,the cache mechanism is still efficient and irreplaceable in network processing,despite the existing of multithreading.
文摘This paper describes a solution to build network-processor-based Radio Network Controller (RNC) in all-IP wireless networks, it includes the structure of the 3rd Generation (3G) wireless networks and the role of network nodes, such as Base Station (BS), RNC, and Packet-Switched Core Networks (PSCN). The architecture of IXP2800 network processor; the detailed implementation of the solution on IXP2800-based RNC are also covered. This solution can provide scalable IP forward features and it will be widely used in 3G RNCs.
基金Supported by the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList)the National High-Tech Research and Development (863) Programof China (No. 2007AA01Z468)
文摘Today's firewalls and security gateways are required to not only block unauthorized accesses by authenticating packet headers, but also inspect flow payloads against malicious intrusions. Deep inspection emerges as a seamless integration of packet classification for access control and pattern matching for intrusion prevention. The two function blocks are linked together via well-designed session lookup schemes. This paper presents an architecture-aware session lookup scheme for deep inspection on network processors (NPs). Test results show that the proposed session data structure and integration approach can achieve the OC-48 line rate (2.5 Gbps) with inline stateful content inspection on the Intel IXP2850 NP. This work provides an insight into application design and implementation on NPs and principles for performance tuning of NP-based programming such as data allocation, task partitioning, latency hiding, and thread synchronization.
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 863-300-01-99) and the National Natural Science Foundation of China (No. 60173009)
文摘High-performance network processors are expected to play an important role in future high-speed routers. This paper focuses on two representative techniques needed for high-performance network processors: hardwired logic design and multithread design. Using hardwired logic, this paper compares a single-thread design with a multithread design, and proposes general models and principles to analyze the clock frequency and the resource cost for these environments. Then, two IP header processing schemes, one in single-thread mode and the other in double-thread mode, are developed using these principles and the implementation results verified the theoretical calculation.
基金Sponsored by the National Defence Research Foundation of China(Grant No.413460303).
文摘Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.
文摘As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.
基金The National High-Tech Research and Development Program of China(863 Program)2014AA01A705
文摘Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers.
文摘This paper reviewed two types of network storage technique: NAS and SAN. After comparing and analyzing, it concluded that the ultimate realization of network storage will be in the eventual convergence of NAS and SAN architectures. Currently, all the integration methods are based on the architecture level. This paper presented a device level integration scheme based on IXP1200 network processor. The device can be used as an NAS file server or an SAN’s storage node. Furthermore, it can be used as a bridge to connect NAS and SAN, and then be shared by their clients.
文摘We cleveloped a high-speed information retrieval system. The system hased on the IXP 2800 is one of the dedicute device. The velocity of the information retrieval is 6.8 Gb/s. The protocol support Telnet, FTP, SMTP, POP3 etc. various networks protocols. The information retrieval supports the key word and the natural language process. This paper explains the hardware system, software system and the index of the performance. Key words network processor - IXP2800 - information retrieval - IXA CLC number TP 309 Foundation item: Supported by the National Natural Science Foundation of China (69873016 & 69972017) and the National High Technology Development Program of China (863-301-06-1)Biography: SHI Shu-dong (1963-), male, Ph. D. candidate, research direction: network & information security.
文摘Recent efforts to add new services to the wide-band code division multiple accesses (WCDMA) system have increased interest in network processor (NP)-based routers that are easy to extend and evolve. In this paper, an application of NPs in routing engine module (REM) of radio network controller (RNC) in WCDMA system is proposed. The measuring results show that NPs have good performance and efficiency in routing traffic of the communication network and the simulation verifies the fast forwarding function of NPs.
基金Project (No. 2011ZX01034-002-002-003) supported by the National Science and Technology Major Projects of the Ministry of Industry and Information Technology, China
文摘This paper deals with an in-line network security processor (NSP) design that implements the Intemet Protocol Security (IPSec) protocol processing for the 10 Gbps Ethernet. The 10 Gbps high speed data transfer, the IPSec processing in- cluding the crypto-operation, the database query, and IPSec header processing are integrated in the design. The in-line NSP is implemented using 65 nm CMOS technology and the layout area is 2.5 mm^3 mm with 360 million gates. A configurable crossbar data transfer skeleton implementing an iSLIP scheduling algorithm is proposed, which enables simultaneous data transfer between the heterogeneous multiple cores. There are, in addition, a high speed input/output data buffering mechanism and design of high performance hardware structures for modules, wherein the transfer efficiency and the resource utilization are maximized and the IPSec protocol processing achieves 10 Gbps line speed. A high speed and low power hardware look-up method is proposed, which effectively reduces the area and power dissipation. The post simulation results demonstrate that the design gives a peak throughput for the Authentication Header (AH) transport mode of 10.06 Gbps with the average test packet length of 512 bytes under the clock rate of 250 MHz, and power dissipation less than 1 W is obtained. An FPGA prototype is constructed to verify the function of the design. A test bench is being set up for performance and function verification.
基金Supported by the National Natural Science Foundation of China (Grant Nos. 90715029 and 60603053)the Cultivation Fund of the Key Scientific and Technical Innovation Project, Ministry of Edacation of Chinathe Key Project of Science & Technology of Hunan Province(Grant No. 2006GK2006)
文摘Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor network (APN), communication contention has a strong influence on the execution time of a parallel application. This paper investigates the incorporation of contention awareness into task scheduling. The innovation is the idea of dynamically scheduling edges to links, for which we use the earliest finish communication time search algorithm based on shortest-path search method. The other novel idea proposed in this paper is scheduling priority based on recursive rank computation on heterogeneous arbitrary processor network. In the end, to reduce time complexity of algorithm, a parallel algorithm is proposed and speedup O(PPE) is achieved. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm significantly surpasses classic and static communication contention awareness algorithm, especially for high data transmission rate parallel application.