The most popular hardware used for parallel depth migration is the PC-Cluster but its application is limited due to large space occupation and high power consumption. In this paper, we introduce a new hardware archite...The most popular hardware used for parallel depth migration is the PC-Cluster but its application is limited due to large space occupation and high power consumption. In this paper, we introduce a new hardware architecture, based on which the finite difference (FD) wavefield-continuation depth migration can be conducted using the Graphics Processing Unit (GPU) as a CPU coprocessor. We demonstrate the program module and three key optimization steps for implementing FD depth migration: memory, thread structure, and instruction optimizations and consider evaluation methods for the amount of optimization. 2D and 3D models are used to test depth migration on the GPU. The tested results show that the depth migration computational efficiency greatly increased using the general-purpose GPU, increasing by at least 25 times compared to the AMD 2.5 GHz CPU.展开更多
The 32-bit extensible embedded processor RISC3200 originating from an RTL prototype core is intended for low-cost consumer multimedia products. In order to incorporate the reduced instruction set and the multimedia ex...The 32-bit extensible embedded processor RISC3200 originating from an RTL prototype core is intended for low-cost consumer multimedia products. In order to incorporate the reduced instruction set and the multimedia extension instruction set in a unifying pipeline, a scalable super-pipeline technique is adopted. Several other optimization techniques are proposed to boost the frequency and reduce the average CPI of the unifying pipeline. Based on a data flow graph (DFG) with delay information, the critical path of the pipeline stage can be located and shortened. This paper presents a distributed data bypass unit and a centralized pipeline control scheme for achieving lower CPI. Synthesis and simulation showed that the optimization techniques enable RISC3200 to operate at 200 MHz with an average CPI of 1.16. The core was integrated into a media SOC chip taped out in SMIC 0.18-micron technology. Preliminary testing result showed that the processor works well as we expected.展开更多
A new efficient adapting virtual intermediate instruction set,V-IIS,is designed and implemented towards the optimized dynamic binary translator (DBT) system.With the help of this powerful but previously little-studied...A new efficient adapting virtual intermediate instruction set,V-IIS,is designed and implemented towards the optimized dynamic binary translator (DBT) system.With the help of this powerful but previously little-studied component,DBTs can not only get rid of the dependence of machine(s),but also get better performance.From our systematical study and evaluation,experimental results demonstrate that if V-IIS is well designed,without affecting the other optimizing measures,this could make DBT's performance close to those who do not have intermediate instructions.This study is an important step towards the grand goal of high performance "multi-source" and "multi-target" dynamic binary translation.展开更多
Defending against return-oriented programing(ROP) attacks is extremely challenging for modern operating systems.As the most popular mobile OS running on ARM,Android is even more vulnerable to ROP attacks due to its we...Defending against return-oriented programing(ROP) attacks is extremely challenging for modern operating systems.As the most popular mobile OS running on ARM,Android is even more vulnerable to ROP attacks due to its weak implementation of ASLR and the absence of effective control-flow integrity enforcement.In this paper,leveraging specific ARM features,an instruction randomization strategy to mitigate ROP attacks in Android even with the threat of single pointer leakage vulnerabilities is proposed.By popping out more registers in functions' epilogue instructions and reallocating registers in function scopes,branch targets in all(direct and indirect) branch instructions potential to be ROP gadgets are changed randomly.Without the knowledge of binaries' runtime instructions layout,adversary's repeated control flow transfer in ROP exploits will be subverted.Furthermore,this instruction randomization idea has been implemented in both Android Dalvik runtime and ART.Corresponding evaluations proved it is capable to introduce enough randomness for more than 99% discovered functions and thwart about 95% ROP gadgets in application's shared libraries and oat file compiled from Dalvik bytecode.Besides,evaluations on real-world exploits also confirmed its effectiveness on mitigating ROP attacks within acceptable performance overhead.展开更多
基金supported by the National Natural Science Foundation of China (Nos. 41104083 and 40804024) Fundamental Research Funds for the Central Universities (No, 2011YYL022)
文摘The most popular hardware used for parallel depth migration is the PC-Cluster but its application is limited due to large space occupation and high power consumption. In this paper, we introduce a new hardware architecture, based on which the finite difference (FD) wavefield-continuation depth migration can be conducted using the Graphics Processing Unit (GPU) as a CPU coprocessor. We demonstrate the program module and three key optimization steps for implementing FD depth migration: memory, thread structure, and instruction optimizations and consider evaluation methods for the amount of optimization. 2D and 3D models are used to test depth migration on the GPU. The tested results show that the depth migration computational efficiency greatly increased using the general-purpose GPU, increasing by at least 25 times compared to the AMD 2.5 GHz CPU.
基金Project supported by the Hi-Tech Research and Development Pro-gram (863) of China (No. 2002 AA1Z1140) and the Fork Ying TongEducation Foundation (No. 94031), China
文摘The 32-bit extensible embedded processor RISC3200 originating from an RTL prototype core is intended for low-cost consumer multimedia products. In order to incorporate the reduced instruction set and the multimedia extension instruction set in a unifying pipeline, a scalable super-pipeline technique is adopted. Several other optimization techniques are proposed to boost the frequency and reduce the average CPI of the unifying pipeline. Based on a data flow graph (DFG) with delay information, the critical path of the pipeline stage can be located and shortened. This paper presents a distributed data bypass unit and a centralized pipeline control scheme for achieving lower CPI. Synthesis and simulation showed that the optimization techniques enable RISC3200 to operate at 200 MHz with an average CPI of 1.16. The core was integrated into a media SOC chip taped out in SMIC 0.18-micron technology. Preliminary testing result showed that the processor works well as we expected.
基金Projects(12R21414600)supported by Shanghai Municipal Science and Technology Commission,China
文摘A new efficient adapting virtual intermediate instruction set,V-IIS,is designed and implemented towards the optimized dynamic binary translator (DBT) system.With the help of this powerful but previously little-studied component,DBTs can not only get rid of the dependence of machine(s),but also get better performance.From our systematical study and evaluation,experimental results demonstrate that if V-IIS is well designed,without affecting the other optimizing measures,this could make DBT's performance close to those who do not have intermediate instructions.This study is an important step towards the grand goal of high performance "multi-source" and "multi-target" dynamic binary translation.
基金supported by the National Natural Science Foundation of China(Grant No.61202387,61332019 and 61373168)the National Basic Research Program of China(“973”Program)(Grant No.2014CB340600)
文摘Defending against return-oriented programing(ROP) attacks is extremely challenging for modern operating systems.As the most popular mobile OS running on ARM,Android is even more vulnerable to ROP attacks due to its weak implementation of ASLR and the absence of effective control-flow integrity enforcement.In this paper,leveraging specific ARM features,an instruction randomization strategy to mitigate ROP attacks in Android even with the threat of single pointer leakage vulnerabilities is proposed.By popping out more registers in functions' epilogue instructions and reallocating registers in function scopes,branch targets in all(direct and indirect) branch instructions potential to be ROP gadgets are changed randomly.Without the knowledge of binaries' runtime instructions layout,adversary's repeated control flow transfer in ROP exploits will be subverted.Furthermore,this instruction randomization idea has been implemented in both Android Dalvik runtime and ART.Corresponding evaluations proved it is capable to introduce enough randomness for more than 99% discovered functions and thwart about 95% ROP gadgets in application's shared libraries and oat file compiled from Dalvik bytecode.Besides,evaluations on real-world exploits also confirmed its effectiveness on mitigating ROP attacks within acceptable performance overhead.