With recent efforts to build foundational certified software systems, two different approaches have been proposed to certify thread context switching. One is to certify both threads and context switching in a single l...With recent efforts to build foundational certified software systems, two different approaches have been proposed to certify thread context switching. One is to certify both threads and context switching in a single logic system, and the other certifies threads and context switching at different abstraction levels. The former requires heavyweight extensions in the logic system to support first-class code pointers and recursive specifications. Moreover, the specification for context switching is very complex. The latter supports simpler and more natural specifications, but it requires the contexts of threads to be abstracted away completely when threads are certified. As a result, the conventional implementation of context switching used in most systems needs to be revised to make the abstraction work. In this paper, we extend the second approach to certify the conventional implementation, where the clear abstraction for threads is unavailable since both threads and context switching hold pointers of thread contexts. To solve this problem, we allow the program specifications for threads to refer to pointers of thread contexts. Thread contexts are treated as opaque structures, whose contents are unspecified and should never be accessed by the code of threads. Therefore, the advantage of avoiding the direct support of first-class code pointers is still preserved in our method. Besides, our new approach is also more lightweight. Instead of using two different logics to certify threads and context switching, we employ only one program logic with two different specifications for the context switching. One is used to certify the implementation itself, and the more abstract one is used as an interface between threads and context switching at a higher abstraction level. The consistency between the two specifications are enforced by the global program invariant.展开更多
The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take mult...The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching展开更多
The Rate Distortion Optimization(RDO)algorithm in High Efficiency Video Coding(HEVC)has many iterations and a large number of calculations.In order to decrease the calculation time and meet the requirements of fast sw...The Rate Distortion Optimization(RDO)algorithm in High Efficiency Video Coding(HEVC)has many iterations and a large number of calculations.In order to decrease the calculation time and meet the requirements of fast switching of RDO algorithms of different scales,an RDO dynamic reconfigurable structure is proposed.First,the Quantization Parameter(QP)and bit rate values were loaded through an H⁃tree Configurable Network(HCN),and the execution status of the array was detected in real time.When the switching request of the RDO algorithm was detected,the corresponding configuration information was delivered.This self⁃reconfiguration implementation method improved the flexibility and utilization of hardware.Experimental results show that when the control bit width was only increased by 31.25%,the designed configuration network could increase the number of controllable processing units by 32 times,and the execution cycle was 50%lower than the same type of design.Compared with previous RDO algorithm,the RDO algorithm implemented on the reconfigurable array based on the configuration network had an average operating frequency increase of 12.5%and an area reduction of 56.4%.展开更多
Inline assembly code is common in system software to interact with the underlying hardware platforms. The safety and correctness of the assembly code is crucial to guarantee the safety of the whole system. In this pap...Inline assembly code is common in system software to interact with the underlying hardware platforms. The safety and correctness of the assembly code is crucial to guarantee the safety of the whole system. In this paper, we propose a practical Hoare-style program logic for verifying SPARC (Scalable Processor Architecture) assembly code. The logic supports modular reasoning about the main features of SPARCv8 ISA (instruction set architecture), including delayed control transfers, delayed writes to special registers, and register windows. It also supports relational reasoning for refinement verification. We have applied it to verify that there is a contextual refinement between a context switch routine in SPARCv8 and a switch primitive. The program logic and its soundness proof have been mechanized in Coq.展开更多
基金Supported by the National Natural Science Foundation of China under Grant Nos.90718026 and 60928004,ChinaPostdoctoral Science Foundation under Grant No.20080430770Natural Science Foundation of Jiangsu Province,China under Grant No. BK2008181
文摘With recent efforts to build foundational certified software systems, two different approaches have been proposed to certify thread context switching. One is to certify both threads and context switching in a single logic system, and the other certifies threads and context switching at different abstraction levels. The former requires heavyweight extensions in the logic system to support first-class code pointers and recursive specifications. Moreover, the specification for context switching is very complex. The latter supports simpler and more natural specifications, but it requires the contexts of threads to be abstracted away completely when threads are certified. As a result, the conventional implementation of context switching used in most systems needs to be revised to make the abstraction work. In this paper, we extend the second approach to certify the conventional implementation, where the clear abstraction for threads is unavailable since both threads and context switching hold pointers of thread contexts. To solve this problem, we allow the program specifications for threads to refer to pointers of thread contexts. Thread contexts are treated as opaque structures, whose contents are unspecified and should never be accessed by the code of threads. Therefore, the advantage of avoiding the direct support of first-class code pointers is still preserved in our method. Besides, our new approach is also more lightweight. Instead of using two different logics to certify threads and context switching, we employ only one program logic with two different specifications for the context switching. One is used to certify the implementation itself, and the more abstract one is used as an interface between threads and context switching at a higher abstraction level. The consistency between the two specifications are enforced by the global program invariant.
文摘The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching
基金Sponsored by the National Natural Science Foundation of China(Grant Nos.61834005,61772417,61802304,61602377,and 61634004)the Shaanxi Province Coordination Innovation Project of Science and Technology(Grant No.2016KTZDGY02-04-02)+1 种基金the Shaanxi Provincial Key R&D Plan(Grant No.2017GY-060)the Shaanxi International Science and Technology Cooperation Program(Grant No.2018KW-006).
文摘The Rate Distortion Optimization(RDO)algorithm in High Efficiency Video Coding(HEVC)has many iterations and a large number of calculations.In order to decrease the calculation time and meet the requirements of fast switching of RDO algorithms of different scales,an RDO dynamic reconfigurable structure is proposed.First,the Quantization Parameter(QP)and bit rate values were loaded through an H⁃tree Configurable Network(HCN),and the execution status of the array was detected in real time.When the switching request of the RDO algorithm was detected,the corresponding configuration information was delivered.This self⁃reconfiguration implementation method improved the flexibility and utilization of hardware.Experimental results show that when the control bit width was only increased by 31.25%,the designed configuration network could increase the number of controllable processing units by 32 times,and the execution cycle was 50%lower than the same type of design.Compared with previous RDO algorithm,the RDO algorithm implemented on the reconfigurable array based on the configuration network had an average operating frequency increase of 12.5%and an area reduction of 56.4%.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61632005.
文摘Inline assembly code is common in system software to interact with the underlying hardware platforms. The safety and correctness of the assembly code is crucial to guarantee the safety of the whole system. In this paper, we propose a practical Hoare-style program logic for verifying SPARC (Scalable Processor Architecture) assembly code. The logic supports modular reasoning about the main features of SPARCv8 ISA (instruction set architecture), including delayed control transfers, delayed writes to special registers, and register windows. It also supports relational reasoning for refinement verification. We have applied it to verify that there is a contextual refinement between a context switch routine in SPARCv8 and a switch primitive. The program logic and its soundness proof have been mechanized in Coq.