A novel kind of multi-core magnetic composite particles, the surfaces of which were respectively mo- dified with goat-anti-mouse IgG and antitransferrin receptor(anti-CD71), was prepared. The fetal nucleated red blo...A novel kind of multi-core magnetic composite particles, the surfaces of which were respectively mo- dified with goat-anti-mouse IgG and antitransferrin receptor(anti-CD71), was prepared. The fetal nucleated red blood cells(FNRBCs) in the peripheral blood of a gravida were rapidly and effectively enriched and separated by the mo- dified multi-core magnetic composite particles in an external magnetic field. The obtained FNRBCs were used for the identification of the fetal sex by means of fluorescence in situ hybridization(FISH) technique. The results demonstrate that the multi-core magnetic composite particles meet the requirements for the enrichment and speration of FNRBCs with a low concentration and the accuracy of detetion for the diagnosis of fetal sex reached to 95%. Moreover, the obtained FNRBCs were applied to the non-invasive diagnosis of Down syndrome and chromosome 3p21 was de- tected. The above facts indicate that the novel multi-core magnetic composite particles-based method is simple, relia- ble and cost-effective and has opened up vast vistas for the potential application in clinic non-invasive prenatal diag- nosis.展开更多
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.展开更多
A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time geneti...A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time genetic task mapping algorithm is proposed during the design stage to generate multiple task mapping solutions which cover a maximum range of chips. Then, during the run, one optimal task mapping solution is selected. Additionally, logical cores are mapped to physically available cores. Both core asymmetry and topological changes are considered in the proposed approach. Experimental results show that the performance yield of the proposed approach is 96% on average, and the communication cost, power consumption and peak temperature are all optimized without loss of performance yield.展开更多
Decreasing mode coupling coefficient(κ) is an effective approach to suppress the inter-core crosstalk. Therefore, we deploy a low index rod and rectangle trench in the middle of two neighboring cores to reduce κ so ...Decreasing mode coupling coefficient(κ) is an effective approach to suppress the inter-core crosstalk. Therefore, we deploy a low index rod and rectangle trench in the middle of two neighboring cores to reduce κ so that the overlap of electric field distribution can be suppressed. We also propose approximate analytical solution(AAS) for κ of two crosstalk suppression models, which are two cores with one low index rod deployed in the middle and two cores with one low index rectangle trench deployed in the middle. We then do some modification for the results obtained by AAS and the modified results are proved to agree well with that obtained by finite element method(FEM). Therefore, we can use the modified AAS to get inter-core crosstalk for abovementioned two models quickly.展开更多
This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the ...This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.展开更多
In this paper, the influencing factors that affect few-mode and multi core optical fiber channel are analyzed in a comprehensive way. The theoretical modeling and computer simulation of the information channel are car...In this paper, the influencing factors that affect few-mode and multi core optical fiber channel are analyzed in a comprehensive way. The theoretical modeling and computer simulation of the information channel are carried out and then the modeling scheme of few-mode multicore optical fiber channel based on non-uniform mode field distribution is put forward. The proposed modeling scheme can not only exponentially increases the system capacity through fewmode multi-core optical fiber channel, but has better transmission performance compared to the channel of the same type to the uniform channel revealing from the simulation results.展开更多
In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa...In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.展开更多
The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient...The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient Through-Silicon Via (TSV) technology is critically important. In this paper, various Radio Frequency (RF) TSV designs and models are proposed. Specifically, the Cu-plug TSV with surrounding ground TSVs is used as the baseline structure. For further improvement, the dielectric coaxial and novel air-gap coaxial TSVs are introduced. Using the empirical parameters of these coaxial TSVs, the simulation results are obtained demonstrating that these coaxial RF-TSVs can provide two-order higher of cut-off frequencies than the Cu-plug TSVs. Based on these new RF-TSV technologies, we propose a novel 3D multi-core computer system as well as new architectures for manipulating the interfaces between RF and baseband circuit. Taking into consideration the scaling down of IC manufacture technologies, predictions for the performance of future generations of circuits are made. With simulation results indicating energy per bit and area per bit being reduced by 7% and 11% respectively, we can conclude that the proposed method is a worthwhile guideline for the design of future multi-core computer ICs.展开更多
A 3D compressible nonhydrostatic dynamic core based on a three-point multi-moment constrained finite-volume (MCV) method is developed by extending the previous 2D nonhydrostatic atmospheric dynamics to 3D on a terrain...A 3D compressible nonhydrostatic dynamic core based on a three-point multi-moment constrained finite-volume (MCV) method is developed by extending the previous 2D nonhydrostatic atmospheric dynamics to 3D on a terrainfollowing grid. The MCV algorithm defines two types of moments: the point-wise value (PV) and the volume-integrated average (VIA). The unknowns (PV values) are defined at the solution points within each cell and are updated through the time evolution formulations derived from the governing equations. Rigorous numerical conservation is ensured by a constraint on the VIA moment through the flux form formulation. The 3D atmospheric dynamic core reported in this paper is based on a three-point MCV method and has some advantages in comparison with other existing methods, such as uniform third-order accuracy, a compact stencil, and algorithmic simplicity. To check the performance of the 3D nonhydrostatic dynamic core, various benchmark test cases are performed. All the numerical results show that the present dynamic core is very competitive when compared to other existing advanced models, and thus lays the foundation for further developing global atmospheric models in the near future.展开更多
文摘A novel kind of multi-core magnetic composite particles, the surfaces of which were respectively mo- dified with goat-anti-mouse IgG and antitransferrin receptor(anti-CD71), was prepared. The fetal nucleated red blood cells(FNRBCs) in the peripheral blood of a gravida were rapidly and effectively enriched and separated by the mo- dified multi-core magnetic composite particles in an external magnetic field. The obtained FNRBCs were used for the identification of the fetal sex by means of fluorescence in situ hybridization(FISH) technique. The results demonstrate that the multi-core magnetic composite particles meet the requirements for the enrichment and speration of FNRBCs with a low concentration and the accuracy of detetion for the diagnosis of fetal sex reached to 95%. Moreover, the obtained FNRBCs were applied to the non-invasive diagnosis of Down syndrome and chromosome 3p21 was de- tected. The above facts indicate that the novel multi-core magnetic composite particles-based method is simple, relia- ble and cost-effective and has opened up vast vistas for the potential application in clinic non-invasive prenatal diag- nosis.
基金Project(2008AA01A201) supported the National High-tech Research and Development Program of ChinaProjects(60833004, 60633050) supported by the National Natural Science Foundation of China
文摘Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
文摘A variation-aware task mapping approach is proposed for a multi-core network-on-chips with redundant cores, which includes both the design-time mapping and run-time scheduling algorithms. Firstly, a design-time genetic task mapping algorithm is proposed during the design stage to generate multiple task mapping solutions which cover a maximum range of chips. Then, during the run, one optimal task mapping solution is selected. Additionally, logical cores are mapped to physically available cores. Both core asymmetry and topological changes are considered in the proposed approach. Experimental results show that the performance yield of the proposed approach is 96% on average, and the communication cost, power consumption and peak temperature are all optimized without loss of performance yield.
基金supported by National B a-sic Research Program of China(Grant No.2012CB315905)National Natural Science Foundation of China(Grant No.61501027)+1 种基金China Postdoctoral Science Foundation(Grant No.2015M570934)Fundamental Research Funds for the Central Universities(Grant No.FRF-TP-15-031A1)
文摘Decreasing mode coupling coefficient(κ) is an effective approach to suppress the inter-core crosstalk. Therefore, we deploy a low index rod and rectangle trench in the middle of two neighboring cores to reduce κ so that the overlap of electric field distribution can be suppressed. We also propose approximate analytical solution(AAS) for κ of two crosstalk suppression models, which are two cores with one low index rod deployed in the middle and two cores with one low index rectangle trench deployed in the middle. We then do some modification for the results obtained by AAS and the modified results are proved to agree well with that obtained by finite element method(FEM). Therefore, we can use the modified AAS to get inter-core crosstalk for abovementioned two models quickly.
基金Supported by the National High Technology Research and Development Programme of China(No.2010AA012302,2009AA01 A134)Tsinghua National Laboratory for Information Science and Technology(TNList)Cross-discipline Foundation
文摘This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.
基金supports from National High Technology 863 Program of China(No.2013AA013403,2015AA015501,2015AA015502,2015AA015504)National NSFC(No.61425022/61522501/61307086/61475024/61275158/61201151/61275074/61372109)+4 种基金Beijing Nova Program(No.Z141101001814048)Beijing Excellent Ph.D.Thesis Guidance Foundation(No.20121001302)the Universities Ph.D.Special Research Funds(No.20120005110003/20120005120007)Fund of State Key Laboratory of IPOC(BUPT)P.R.China
文摘In this paper, the influencing factors that affect few-mode and multi core optical fiber channel are analyzed in a comprehensive way. The theoretical modeling and computer simulation of the information channel are carried out and then the modeling scheme of few-mode multicore optical fiber channel based on non-uniform mode field distribution is put forward. The proposed modeling scheme can not only exponentially increases the system capacity through fewmode multi-core optical fiber channel, but has better transmission performance compared to the channel of the same type to the uniform channel revealing from the simulation results.
基金Supported by the China Postdoctoral Science Foundation(No.2014M552115)the Fundamental Research Funds for the Central Universities,ChinaUniversity of Geosciences(Wuhan)(No.CUGL140833)the National Key Technology Support Program of China(No.2011BAH06B04)
文摘In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.
文摘The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient Through-Silicon Via (TSV) technology is critically important. In this paper, various Radio Frequency (RF) TSV designs and models are proposed. Specifically, the Cu-plug TSV with surrounding ground TSVs is used as the baseline structure. For further improvement, the dielectric coaxial and novel air-gap coaxial TSVs are introduced. Using the empirical parameters of these coaxial TSVs, the simulation results are obtained demonstrating that these coaxial RF-TSVs can provide two-order higher of cut-off frequencies than the Cu-plug TSVs. Based on these new RF-TSV technologies, we propose a novel 3D multi-core computer system as well as new architectures for manipulating the interfaces between RF and baseband circuit. Taking into consideration the scaling down of IC manufacture technologies, predictions for the performance of future generations of circuits are made. With simulation results indicating energy per bit and area per bit being reduced by 7% and 11% respectively, we can conclude that the proposed method is a worthwhile guideline for the design of future multi-core computer ICs.
基金supported by the National Key Research and Development Program of China (Grant Nos. 2017YFC1501901 and 2017YFA0603901)the Beijing Natural Science Foundation (Grant No. JQ18001)
文摘A 3D compressible nonhydrostatic dynamic core based on a three-point multi-moment constrained finite-volume (MCV) method is developed by extending the previous 2D nonhydrostatic atmospheric dynamics to 3D on a terrainfollowing grid. The MCV algorithm defines two types of moments: the point-wise value (PV) and the volume-integrated average (VIA). The unknowns (PV values) are defined at the solution points within each cell and are updated through the time evolution formulations derived from the governing equations. Rigorous numerical conservation is ensured by a constraint on the VIA moment through the flux form formulation. The 3D atmospheric dynamic core reported in this paper is based on a three-point MCV method and has some advantages in comparison with other existing methods, such as uniform third-order accuracy, a compact stencil, and algorithmic simplicity. To check the performance of the 3D nonhydrostatic dynamic core, various benchmark test cases are performed. All the numerical results show that the present dynamic core is very competitive when compared to other existing advanced models, and thus lays the foundation for further developing global atmospheric models in the near future.