1.Objective The West Kunlun in Xinjiang is located on the northwestern margin of the Qinghai-Tibet Plateau(Fig.1a)and at the junction of the Paleo-Asian tectonic domain and the Tethys tectonic domain.It serves as an i...1.Objective The West Kunlun in Xinjiang is located on the northwestern margin of the Qinghai-Tibet Plateau(Fig.1a)and at the junction of the Paleo-Asian tectonic domain and the Tethys tectonic domain.It serves as an important area for the study on the geologic evolution of the Karakorum-West Kunlun due to its special tectonic position.展开更多
BACKGROUND A decreased autophagic capacity of bone marrow mesenchymal stromal cells(BMSCs)has been suggested to be an important cause of decreased osteogenic differentiation.A pharmacological increase in autophagy of ...BACKGROUND A decreased autophagic capacity of bone marrow mesenchymal stromal cells(BMSCs)has been suggested to be an important cause of decreased osteogenic differentiation.A pharmacological increase in autophagy of BMSCs is a potential therapeutic option to increase osteoblast viability and ameliorate osteoporosis.AIM To explore the effects of sinomenine(SIN)on the osteogenic differentiation of BMSCs and the underlying mechanisms.METHODS For in vitro experiments,BMSCs were extracted from sham-treated mice and ovariectomized mice,and the levels of autophagy markers and osteogenic differentiation were examined after treatment with the appropriate concen-trations of SIN and the autophagy inhibitor 3-methyladenine.In vivo,the therapeutic effect of SIN was verified by establishing an ovariectomy-induced mouse model and by morphological and histological assays of the mouse femur.RESULTS SIN reduced the levels of AKT and mammalian target of the rapamycin(mTOR)phosphorylation in the phosphatidylinositol 3-kinase(PI3K)/AKT/mTOR signaling pathway,inhibited mTOR activity,and increased autophagy ability of BMSCs,thereby promoting the osteogenic differentiation of BMSCs and effectively alleviating bone loss in ovariectomized mice in vivo.CONCLUSION The Chinese medicine SIN has potential for the treatment of various types of osteoporosis,bone homeostasis disorders,and autophagy-related diseases.展开更多
With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for sc...With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.展开更多
Let [b, T] be the commutator generated by a Lipschitz function b ∈ Lip(β)(0 〈 β 〈 1) and multiplier T, The authors studied the boundedness of [b, T] on the Lebesgue spaces and Hardy spaces.
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ...Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.展开更多
Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles d...Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization.展开更多
The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and p...The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency,especially of applications used in high-performance computing(HPC).Importantly,the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner.Therefore,a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts.Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection,or a handshake mechanism in which the data is only sent after acknowledgments are received.However,these mechanisms are characterized by both inefficient data transfer and increased area cost.Good performance of the dataflow architecture depends on the efficiency of data transfer.In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost,we propose a Look-Ahead Acknowledgment(LAA)mechanism.LAA accelerates the execution flow by speculatively acknowledging ahead without penalties.Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%,with a reduction in the average execution time by 17.4%and an increase in the average power efficiency of dataflow processors by 22.4%.Crucially,our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%.In conclusion,the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.展开更多
基金supported by the National Science Foundation of China(41302051)the Natural Science Foundation of Shaanxi Province(2020JM-311)the project of China Geological Survey(DD20221636,DD20221691)。
文摘1.Objective The West Kunlun in Xinjiang is located on the northwestern margin of the Qinghai-Tibet Plateau(Fig.1a)and at the junction of the Paleo-Asian tectonic domain and the Tethys tectonic domain.It serves as an important area for the study on the geologic evolution of the Karakorum-West Kunlun due to its special tectonic position.
基金Supported by National Natural Science Foundation of China,No.82072425.
文摘BACKGROUND A decreased autophagic capacity of bone marrow mesenchymal stromal cells(BMSCs)has been suggested to be an important cause of decreased osteogenic differentiation.A pharmacological increase in autophagy of BMSCs is a potential therapeutic option to increase osteoblast viability and ameliorate osteoporosis.AIM To explore the effects of sinomenine(SIN)on the osteogenic differentiation of BMSCs and the underlying mechanisms.METHODS For in vitro experiments,BMSCs were extracted from sham-treated mice and ovariectomized mice,and the levels of autophagy markers and osteogenic differentiation were examined after treatment with the appropriate concen-trations of SIN and the autophagy inhibitor 3-methyladenine.In vivo,the therapeutic effect of SIN was verified by establishing an ovariectomy-induced mouse model and by morphological and histological assays of the mouse femur.RESULTS SIN reduced the levels of AKT and mammalian target of the rapamycin(mTOR)phosphorylation in the phosphatidylinositol 3-kinase(PI3K)/AKT/mTOR signaling pathway,inhibited mTOR activity,and increased autophagy ability of BMSCs,thereby promoting the osteogenic differentiation of BMSCs and effectively alleviating bone loss in ovariectomized mice in vivo.CONCLUSION The Chinese medicine SIN has potential for the treatment of various types of osteoporosis,bone homeostasis disorders,and autophagy-related diseases.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB0200501, tile National Natural Science Foundation of China under Grant Nos. 61332009 and 61521092, the Open Project Program of State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2016A04 and tile Beijing Municipal Science and Technology Commission under Grant No. Z15010101009, the Open Project Program of State Key Laboratory of Computer Architecture under Grant No. CARCH201503, China Scholarship Council, and Beijing Advanced hmovation Center for hnaging Technology.
文摘With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.
基金Supported by National 973 project(G.19990751)the Doctoral Programme Foundation of Institution of Higher Education of China(20040027001)
文摘Let [b, T] be the commutator generated by a Lipschitz function b ∈ Lip(β)(0 〈 β 〈 1) and multiplier T, The authors studied the boundedness of [b, T] on the Lebesgue spaces and Hardy spaces.
基金This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2015AA01A301, the National Natural Science Foundation of China under Grant No. 61332009, the National HeGaoJi Project of China under Grant No. 2013ZX0102-8001-001-001, and the Beijing Municipal Science and Technology Commission under Grant Nos. Z15010101009 and Z151100003615006.
文摘Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB0200501, the National Natural Science Foundation of China under Grant Nos. 61332009 and 61521092, the Open Project Program of State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2016A04, and the Beijing Municipal Science and Technology Commission under Grant No. Z15010101009.
文摘Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization.
基金This work was supported by the Project of the State Grid Corporation of China in 2020"Integration Technology Research and Prototype Development for High End Controller Chip"under Grant No.5700-202041264A-0-0-00.
文摘The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency,especially of applications used in high-performance computing(HPC).Importantly,the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner.Therefore,a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts.Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection,or a handshake mechanism in which the data is only sent after acknowledgments are received.However,these mechanisms are characterized by both inefficient data transfer and increased area cost.Good performance of the dataflow architecture depends on the efficiency of data transfer.In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost,we propose a Look-Ahead Acknowledgment(LAA)mechanism.LAA accelerates the execution flow by speculatively acknowledging ahead without penalties.Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%,with a reduction in the average execution time by 17.4%and an increase in the average power efficiency of dataflow processors by 22.4%.Crucially,our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%.In conclusion,the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.