We propose a pilot domain non-orthogonal multiple access(NOMA)for uplink massive devices grant-free random access scenarios in massive multiple-input multiple-output(MIMO)maritime communication systems.These scenarios...We propose a pilot domain non-orthogonal multiple access(NOMA)for uplink massive devices grant-free random access scenarios in massive multiple-input multiple-output(MIMO)maritime communication systems.These scenarios are characterized by numerous devices with sporadic access behavior,and therefore only a subset of them are active.Due to massive potential devices in the network,it is infeasible to assign a unique orthogonal pilot to each device in advance.In such scenarios,pilot decontamination is a crucial problem.In this paper,the devices are randomly assigned non-orthogonal pilots which are constructed by a linear combination of some orthogonal pilots.We show that a bipartite graph can conveniently describe the interference cancellation(IC)processes of pilot decontamination.High spectrum efficiency(SE)and low outage probability can be obtained by selecting the numbers of orthogonal pilots according to the given probability distribution.Numerical evaluatioDs show that the proposed pilot domain NOMA decreases the outage probability from 20%to 2 e-12 at the SE of 4 bits/s/Hz for a single device,compared to the conventional method of slotted ALOHA with 1024 antennas at the BS,or increases the spectrum efficiency from 1.2 bits/s/Hz to 4 bit/s/Hz at the outage probability of2 e-12 in contrast with the Welch bound equality(WBE)non-orthogonal pilots.展开更多
With the evolution of the communication standards, Software Defined Radio (SDR) is faced with an increasingly important problem to balance more and more complex wireless communication algorithms against relatively lim...With the evolution of the communication standards, Software Defined Radio (SDR) is faced with an increasingly important problem to balance more and more complex wireless communication algorithms against relatively limited processing capability of hardware. And, the competition for computing resources exacerbates the problem and increases time-delay of SDR system. This paper presents an integrated optimization method for the real-time performance of SDR on Linux OS (operating system). The method is composed of three parts: real-time scheduling policy which ensures higher priority for SDR tasks, CGROUPS used to manage and redistribute the computing resources, and fine-grade system timer which makes the process preemption more accurate. According to the experiments, the round-trip data transfer latency decreases low enough to meet the requirement for TD-SCDMA via the application of the method.展开更多
Two-way decode-and-forward(DF) relay technique is an efficient method to improve system performance in 5G networks.However,traditional orthogonal frequency division multiplexing(OFDM) based two-way relay systems only ...Two-way decode-and-forward(DF) relay technique is an efficient method to improve system performance in 5G networks.However,traditional orthogonal frequency division multiplexing(OFDM) based two-way relay systems only consider a per-subcarrier relay strategy,which treats each subcarrier as a separate channel,which results in significant sum rate loss,especially in fading environments.In this paper,a joint coding scheme over multiple subcarriers is involved for multipair users in two-way relay systems to obtain multiuser diversity.A generalized subcarrier pairing strategy is proposed to permit each user-pair to occupy different subcarriers during the two transmission phases,i.e.,the multiple access and broadcast phases.Moreover,a low complexity joint resource allocation scheme is proposed to improve the spectrum efficiency with an additional multi-user diversity gain.Some numerical simulations are finally provided to verify the efficacy of our proposal.展开更多
In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve t...In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.展开更多
The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputi...The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).展开更多
Although the genetic algorithm has been widely used in the polarity optimization of mixed polarity Reed- Muller (MPRM) logic circuits, few studies have taken into account the polarity conversion sequence. In order t...Although the genetic algorithm has been widely used in the polarity optimization of mixed polarity Reed- Muller (MPRM) logic circuits, few studies have taken into account the polarity conversion sequence. In order to im- prove the efficiency of polarity optimization of MPRM logic circuits, we propose an efficient and fast polarity optimiza- tion approach (FPOA) considering the polarity conversion se- quence. The main idea behind the FPOA is that, firstly, the best polarity conversion sequence of the polarity set wait- ing for evaluation is obtained by using the proposed hybrid genetic algorithm (HGA); secondly, each of polarity in the polarity set is converted according to the best polarity con- version sequence obtained by HGA. Our proposed FPOA is implemented in C and a comparative analysis has been pre- sented for MCNC benchmark circuits. The experimental re- suits show that for the circuits with more variables, the FPOA is highly effective in improving the efficiency of polarity op- timization of MPRM logic circuits compared with the tradi- tional polarity optimization approach which neglects the po- larity conversion sequence and the improved polarity opti- mization approach with heuristic technique.展开更多
Delay optimization has recently attracted signif-icant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient...Delay optimization has recently attracted signif-icant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient delay op-timization approach (EDOA) for MPRM logic circuits under the unit delay model, which can derive an optimal MPRM logic circuit with minimum delay. First, the simplest MPRM expression with the fewest number of product terms is ob-tained using a novel Reed-Muller expression simplification approach (RMESA) considering don't-care terms. Second, a minimum delay decomposition approach based on a Huffman tree construction algorithm is utilized on the simplest MPRM expression. Experimental results on MCNC benchmark cir-cuits demonstrate that compared to the Berkeley SIS 1.2 and ABC, the EDOA can significantly reduce delay for most cir-cuits. Furthermore, for a few circuits, while reducing delay, the EDOA incurs an area penalty.展开更多
With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and pub...With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.展开更多
Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage reso...Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.展开更多
基金supported by Key R&D Program of China under Grant 2018YFB1801102National Natural Science Foundation of China(U1736108)+1 种基金Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61621091)Tsinghua University Initiative Scientific Research Program 20193080005。
文摘We propose a pilot domain non-orthogonal multiple access(NOMA)for uplink massive devices grant-free random access scenarios in massive multiple-input multiple-output(MIMO)maritime communication systems.These scenarios are characterized by numerous devices with sporadic access behavior,and therefore only a subset of them are active.Due to massive potential devices in the network,it is infeasible to assign a unique orthogonal pilot to each device in advance.In such scenarios,pilot decontamination is a crucial problem.In this paper,the devices are randomly assigned non-orthogonal pilots which are constructed by a linear combination of some orthogonal pilots.We show that a bipartite graph can conveniently describe the interference cancellation(IC)processes of pilot decontamination.High spectrum efficiency(SE)and low outage probability can be obtained by selecting the numbers of orthogonal pilots according to the given probability distribution.Numerical evaluatioDs show that the proposed pilot domain NOMA decreases the outage probability from 20%to 2 e-12 at the SE of 4 bits/s/Hz for a single device,compared to the conventional method of slotted ALOHA with 1024 antennas at the BS,or increases the spectrum efficiency from 1.2 bits/s/Hz to 4 bit/s/Hz at the outage probability of2 e-12 in contrast with the Welch bound equality(WBE)non-orthogonal pilots.
文摘With the evolution of the communication standards, Software Defined Radio (SDR) is faced with an increasingly important problem to balance more and more complex wireless communication algorithms against relatively limited processing capability of hardware. And, the competition for computing resources exacerbates the problem and increases time-delay of SDR system. This paper presents an integrated optimization method for the real-time performance of SDR on Linux OS (operating system). The method is composed of three parts: real-time scheduling policy which ensures higher priority for SDR tasks, CGROUPS used to manage and redistribute the computing resources, and fine-grade system timer which makes the process preemption more accurate. According to the experiments, the round-trip data transfer latency decreases low enough to meet the requirement for TD-SCDMA via the application of the method.
基金supported by the National Natural Science Foundation of China(NSFC)(No.61501527)State’s Key Project of Research and Development Plan(No.2016YFE0122900-3)+1 种基金the Fundamental Research Funds for the Central Universities,Basic Research Foundation of Science Technology and Innovation Commission of Shenzhen Municipality(No.JCYJ20150630153033410)SYSU-CMU Shunde International Joint Research Institute and 2016 Major Project of Collaborative Innovation in Guangzhou(Research and Application of Ground Satellite Communicaiton Systems for Space Broadband Information Networks)
文摘Two-way decode-and-forward(DF) relay technique is an efficient method to improve system performance in 5G networks.However,traditional orthogonal frequency division multiplexing(OFDM) based two-way relay systems only consider a per-subcarrier relay strategy,which treats each subcarrier as a separate channel,which results in significant sum rate loss,especially in fading environments.In this paper,a joint coding scheme over multiple subcarriers is involved for multipair users in two-way relay systems to obtain multiuser diversity.A generalized subcarrier pairing strategy is proposed to permit each user-pair to occupy different subcarriers during the two transmission phases,i.e.,the multiple access and broadcast phases.Moreover,a low complexity joint resource allocation scheme is proposed to improve the spectrum efficiency with an additional multi-user diversity gain.Some numerical simulations are finally provided to verify the efficacy of our proposal.
基金supported by the National Natural Science Foundation of China(6123200961370059)+1 种基金the High Technology Research and Development Program of China(863 Program)(2011AA01A205)the Fund of the State Key Laboratory of Software Development Environment(SKLSDE2012ZX06)
文摘In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.
基金Acknowledgment This work was supported by Beijing Natural Science Foundation Funded Project (No.4110001), National S&T Major Project (No. 2011ZX03003-002), Tsinghua Independent Research (No. 2010TH203-02) and Samsung Company.
基金supported by the National Key Research and Development Program of China(2018YFB0203901)the National Natural Science Foundation of China(Grant No.61772053)+1 种基金the Hebei Youth Talents Support Project(BJ2019008)the Natural Science Foundation of Hebei Province(F2020204003).
文摘The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).
文摘Although the genetic algorithm has been widely used in the polarity optimization of mixed polarity Reed- Muller (MPRM) logic circuits, few studies have taken into account the polarity conversion sequence. In order to im- prove the efficiency of polarity optimization of MPRM logic circuits, we propose an efficient and fast polarity optimiza- tion approach (FPOA) considering the polarity conversion se- quence. The main idea behind the FPOA is that, firstly, the best polarity conversion sequence of the polarity set wait- ing for evaluation is obtained by using the proposed hybrid genetic algorithm (HGA); secondly, each of polarity in the polarity set is converted according to the best polarity con- version sequence obtained by HGA. Our proposed FPOA is implemented in C and a comparative analysis has been pre- sented for MCNC benchmark circuits. The experimental re- suits show that for the circuits with more variables, the FPOA is highly effective in improving the efficiency of polarity op- timization of MPRM logic circuits compared with the tradi- tional polarity optimization approach which neglects the po- larity conversion sequence and the improved polarity opti- mization approach with heuristic technique.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 61370059 and 61232009)Beijing Natural Science Foundation (4152030), Fundamental Research Funds for the Central Universities (YWF-15-GJSYS-085, YWF-14-JSJXY-14)+1 种基金Open Project Program of National Engineering Research Center for Science & Technology Resources Sharing Service (Beihang University), the fund of the State Key Laboratory of Computer Architecture (CARCH201507)the fund of the State Key Laboratory of Software Development Environment (SKLSDE-2016ZX-13).
文摘Delay optimization has recently attracted signif-icant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient delay op-timization approach (EDOA) for MPRM logic circuits under the unit delay model, which can derive an optimal MPRM logic circuit with minimum delay. First, the simplest MPRM expression with the fewest number of product terms is ob-tained using a novel Reed-Muller expression simplification approach (RMESA) considering don't-care terms. Second, a minimum delay decomposition approach based on a Huffman tree construction algorithm is utilized on the simplest MPRM expression. Experimental results on MCNC benchmark cir-cuits demonstrate that compared to the Berkeley SIS 1.2 and ABC, the EDOA can significantly reduce delay for most cir-cuits. Furthermore, for a few circuits, while reducing delay, the EDOA incurs an area penalty.
基金supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China(Grant No.61772053)+1 种基金the Science Challenge Project,No.TZ2016002the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2017ZX-10)。
文摘With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.
基金This work was supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China under(Grant No.61772053)the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2020ZX15).
文摘Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.