AIM To investigate the effects of direct to colonoscopy pathways on information seeking behaviors and anxiety among colonoscopy-na?ve patients.METHODS Colonoscopy-na?ve patients at two tertiary care hospitals complete...AIM To investigate the effects of direct to colonoscopy pathways on information seeking behaviors and anxiety among colonoscopy-na?ve patients.METHODS Colonoscopy-na?ve patients at two tertiary care hospitals completed a survey immediately prior to their scheduled outpatient procedure and before receiving sedation.Survey items included clinical pathway(direct or consult),procedure indication(cancer screening or symptom investigation),telephone and written contact from the physician endoscopist office,information sources,and pre-procedure anxiety.Participants reported pre-procedure anxiety using a 10 point scale anchored by "very relaxed"(1) and "very nervous"(10).At least three months following the procedure,patient medical records were reviewed to determine sedative dose,procedure indications and any adverse events.The primary comparison was between the direct and consult pathways.Given the very different implications,a secondary analysis considering the patient-reported indication for the procedure(symptoms or screening).Effects of pathway(direct vs consult) were compared both within and between the screening and symptom subgroups.RESULTS Of 409 patients who completed the survey,34% followed a direct pathway.Indications for colonoscopy were similar in each group.The majority of the participants were women(58%),married(61%),and internet users(81%).The most important information source was family physicians(Direct) and specialist physicians(Consult).Use of other information sources,including the internet(20% vs 18%) and Direct family and friends(64% vs 53%),was similar in the Direct and Consult groups,respectively.Only 31% of the 81% who were internet users accessed internet health information.Most sought fundamental information such as what a colonoscopy is or why it is done.Pre-procedure anxiety did not differ between care pathways.Those undergoing colonoscopy for symptoms reported greater anxiety [mean 5.3,95%CI: 5.0-5.7(10 point Likert scale)] than those for screening colonoscopy(4.3,95%CI: 3.9-4.7).CONCLUSION Procedure indication(cancer screening or symptom investigation) was more closely associated with information seeking behaviors and pre-procedure anxiety than care pathway.展开更多
Carotid artery stenting (CAS) is an alternative treatment for patients with severe carotid artery stenosis, especially those with prohibitively high surgical risks.……
Two 1-methyl-1H-benzo[d]imidazole derivatives, C_(18)H_(14)CuN_4O_4·C_4H_8O_2(1) and C_9H_9N_3O(2), have been synthesized and characterized by NMR, MS, FT-IR, elementary analysis and X-ray single-crystal ...Two 1-methyl-1H-benzo[d]imidazole derivatives, C_(18)H_(14)CuN_4O_4·C_4H_8O_2(1) and C_9H_9N_3O(2), have been synthesized and characterized by NMR, MS, FT-IR, elementary analysis and X-ray single-crystal diffraction. Compound 1 crystallizes in monoclinic, space group P2_1/n with a = 9.6888(3), b = 7.3772(2), c = 14.3277(4)A, β = 95.819(3)°, V = 1018.81(5) A3, M_r = 501.98, Z = 2, D_c = 1.636 g/cm^3, F(000) = 518, μ = 1.123 mm^(-1), MoKα radiation(λ = 0.71073 A), the final R = 0.0325 and wR = 0.0859 for 1821 observed reflections with I 〉 2σ(I). Compound 2 crystallizes in monoclinic, space group C2/c with a = 14.2908(14), b = 14.4268(13), c = 8.4802(6) A, β = 108.513(9)o, V = 1657.9(3) A3, M_r = 175.19, Z = 8, D_c = 1.404 g/cm^3, F(000) = 736, μ = 0.097 mm^(-1), Mo Kα radiation(λ = 0.71073 A), the final R = 0.0563 and wR = 0.1531 for 1231 observed reflections with I 〉 2σ(I). Intermolecular(N-H···N, N-H···O) and intramolecular(N-H···N, C-H···O) hydrogen bonds, as well as C-H···π and π-π stacking interactions, help to stabilize the crystal structure of compound 2.展开更多
In this paper, direct sequence spread spectrum multiple access (DS/SSMA) communication system employing serially concatenated trellis coded modulation (TCM) and continuous phase modulation (CPM) over flat Rayleigh fa...In this paper, direct sequence spread spectrum multiple access (DS/SSMA) communication system employing serially concatenated trellis coded modulation (TCM) and continuous phase modulation (CPM) over flat Rayleigh fading channel are presented. The performance of this concatenated TCM/CPM DS/SSMA system is exploited by the theoretical analysis and numerical simulations. The results demonstrate that significant improvements in error probability of this DS/SSMA system over the system with single TCM or CPM of different modulation indices can be achieved under the same conditions.展开更多
To resolve the problem of quantitative analysis in hybrid cloud,a quantitative analysis method,which is based on the security entropy,is proposed.Firstly,according to the information theory,the security entropy is put...To resolve the problem of quantitative analysis in hybrid cloud,a quantitative analysis method,which is based on the security entropy,is proposed.Firstly,according to the information theory,the security entropy is put forward to calculate the uncertainty of the system' s determinations on the irregular access behaviors.Secondly,based on the security entropy,security theorems of hybrid cloud are defined.Finally,typical access control models are analyzed by the method,the method's practicability is validated,and security and applicability of these models are compared.Simulation results prove that the proposed method is suitable for the security quantitative analysis of the access control model and evaluation to access control capability in hybrid cloud.展开更多
A novel PCI Express (peripheral component interconnection express) direct memory access (DMA) transaction method using bridge chip PEX 8311 is proposed. Furthermore, a new method on optimizing PC1 Express DMA tran...A novel PCI Express (peripheral component interconnection express) direct memory access (DMA) transaction method using bridge chip PEX 8311 is proposed. Furthermore, a new method on optimizing PC1 Express DMA transaction through improving both bus-efficiency and DMA-effieiency is presented. A finite state machine (FSM) responding for data and address cycles on PCI Express bus is introduced, and a continuous data burst is realized, which greatly promote bus-efficiency. In software design, a driver framework based on Windows driver model (WDM) and three DMA optimizing options for the proposed PCI Express interface are presented to improve DMA-efficiency. Experiments show that both read and write hardware transaction speed in this paper exceed PCI theoretical maximum speed (133 MBytes/s).展开更多
High speed data communication between digital signal processor and the host is required to meet the demand of most real-time systems. PCI bus technology is a solution of this problem. The principle of data communicati...High speed data communication between digital signal processor and the host is required to meet the demand of most real-time systems. PCI bus technology is a solution of this problem. The principle of data communication based on PCI has been explained. Meanwhile, the technology of data transfer between synchronous dynamic RAM(SDRAM) and an mapping space of on-chip memory(L2) by expansion direct memory access(EDMA) has also been realized.展开更多
The processing speed of the communication between nodes in a parallel processor has become the major bottleneck of the processor's performance.RDMA(Remote Direct Memory Access) technology has drawn more attention ...The processing speed of the communication between nodes in a parallel processor has become the major bottleneck of the processor's performance.RDMA(Remote Direct Memory Access) technology has drawn more attention recently due to its capability of transferring a larger amount of data, higher speed and reliability.4DSP(4 Digital Signal Processing) module comprised of Tiger-SHARC201 chip is connected by LVDS(Low Voltage Differential Signal) circuits.This paper proposes a general and reconfigurable RDMA platform and its corresponding communication protocol with all the routes linked based on the zero copy.The protocol transfers message of DSP by interrupting of DMA and is applied on massive remote image impression, which reduces memory needs and working burden of CPU.The experiment results show this platform is efficient, flexible, and expandable of being integrated to a larger scale in the next development stages.展开更多
This paper proposes a closed-form joint space-time channel and Direction Of Arrival (DOA) blind estimation algorithm for space-thne coded Multi-Carrier Code Division Multiple Access (MC-CDMA) systems equipped with...This paper proposes a closed-form joint space-time channel and Direction Of Arrival (DOA) blind estimation algorithm for space-thne coded Multi-Carrier Code Division Multiple Access (MC-CDMA) systems equipped with a Uniform Linear Array (ULA) at the base station in frequency-selective fading environments. The algorithm uses an ESPRIT-like method to separate multiple co-channel users with different impinging DOAs. As a result, the DOAs for multiple users are obtained. In particular, a set of signal subspaces, every one of which is spanned by the space-time vector channels of an individual user, are also obtained. From these signal subspaces, the space-time channels of multiple users are estimated using the subspace method. Computer simulations illustrate both the validity and the overall performance of the proposed scheme.展开更多
In Direct Sequence Code Division Multiple Access (DS-CDMA) systems,the chip wave-form affects the implementation,system bandwidth,envelope uniformity,eye pattern and Multiple user Access Interference (MAI). In this pa...In Direct Sequence Code Division Multiple Access (DS-CDMA) systems,the chip wave-form affects the implementation,system bandwidth,envelope uniformity,eye pattern and Multiple user Access Interference (MAI). In this paper,based on an elementary density function of a second order polynomial,a class of second order continuity pulses is proposed. From this class of pulses,we can find some members having faster decaying rate,bigger eye opening,more uniform envelope and stronger anti-MAI capability than the Nyquist waveform. The normalized-bandwidth-pulse-shape-factor product,the decaying rate of the tail of the time waveform,the opening of the eye diagram,and the envelope uniformity of the second order continuity pulses are addressed in the paper that provide the basic information for the selection of the chip pulse for CDMA systems.展开更多
This paper proposes Steepest Decreasing Constant Modulus Algorithm (SDCMA) detection in frequency dornain for MultiCarrier Direct Sequence-Code Division Multiple Access (MC DS-CDMA) systems. The proposed algorithm...This paper proposes Steepest Decreasing Constant Modulus Algorithm (SDCMA) detection in frequency dornain for MultiCarrier Direct Sequence-Code Division Multiple Access (MC DS-CDMA) systems. The proposed algorithm is used to equalize independent ladings of all subcarriers. At the same time we compare the SDCMA blind detection with subspace-based Minimum Mean-Squared Error (MMSE) detection. The simulation results show that the pertbrmance of SDCMA blind detection is superior to that of subspace-based MMSE detection and the complexity of the former is much lower than that of the latter.展开更多
In this paper, the complexity and performance of the Auxiliary Vector (AV) based reduced-rank filtering are addressed. The AV filters presented in the previous papers have the general form of the sum of the signature ...In this paper, the complexity and performance of the Auxiliary Vector (AV) based reduced-rank filtering are addressed. The AV filters presented in the previous papers have the general form of the sum of the signature vector of the desired signal and a set of weighted AVs,which can be classified as three categories according to the orthogonality of their AVs and the optimality of the weight coefficients of the AVs. The AV filter with orthogonal AVs and optimal weight coefficients has the best performance, but requires considerable computational complexity and suffers from the numerical unstable operation. In order to reduce its computational load while keeping the superior performance, several low complexity algorithms are proposed to efficiently calculate the AVs and their weight coefficients. The diagonal loading technique is also introduced to solve the numerical unstability problem without complexity increase. The performance of the three types of AV filters is also compared through their application to Direct Sequence Code Division Multiple Access (DS-CDM A) systems for interference suppression.展开更多
Machine learning techniques have become ubiquitous both in industry and academic applications.Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches.Collective ...Machine learning techniques have become ubiquitous both in industry and academic applications.Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches.Collective communications greatly simplify inter-and intra-node data transfer and are an essential part of the distributed training process as information such as gradients must be shared between processing nodes.In this paper,we survey the current state-of-the-art collective communication libraries(namely xCCL,including NCCL,oneCCL,RCCL,MSCCL,ACCL,and Gloo),with a focus on the industry-led ones for deep learning workloads.We investigate the design features of these xCCLs,discuss their use cases in the industry deep learning workloads,compare their performance with industry-made benchmarks(i.e.,NCCL Tests and PARAM),and discuss key take-aways and interesting observations.We believe our survey sheds light on potential research directions of future designs for xCCLs.展开更多
The performance of online analytical processing (OLAP) is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage,...The performance of online analytical processing (OLAP) is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage, and join indexes focus on high perfor- mance storage media, efficient storage models, and reduced query processing. While they effectively perform OLAP applications, there is a vital limitation: main- memory database based OLAP (MMOLAP) cannot provide high performance for a large size data set. In this paper, we propose a novel memory dimension table model, in which the primary keys of the dimension table can be directly mapped to dimensional tuple addresses. To achieve higher performance of dimensional tuple access, we optimize our storage model for dimension tables based on OLAP query workload features. We present directly dimensional tuple accessing (DDTA) based join (DDTA- JOIN), a technique to optimize query processing on the memory dimension table by direct dimensional tuple access. We also contribute by proposing an optimization of the predicate tree to shorten predicate operation length by pruning useless predicate processing. Our experimental results show that the DDTA-JOIN algorithm is superior to both simulated row-store main memory query processing and the open-source column-store main memory database MonetDB, thanks to the reduced join cost and simple yet efficient query processing.展开更多
Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and can...Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.展开更多
In this paper, we propose a fast and simple system emulator, called a system performance emulator(SPE), to evaluate long read operations.The SPE estimates how much system-wide performance is enhanced by using a faster...In this paper, we propose a fast and simple system emulator, called a system performance emulator(SPE), to evaluate long read operations.The SPE estimates how much system-wide performance is enhanced by using a faster solid state disk(SSD).By suspending a CPU for a certain time during direct memory access(DMA) transfer and subtracting this suspended time from the total DMA time, the SPE estimates the improvement in system performance expected from an enhanced SSD prior to its manufacture.We also examine the relation between storage performance and system performance using the SPE.展开更多
基金Health Sciences Centre Medical Staff Council Resident Research Award
文摘AIM To investigate the effects of direct to colonoscopy pathways on information seeking behaviors and anxiety among colonoscopy-na?ve patients.METHODS Colonoscopy-na?ve patients at two tertiary care hospitals completed a survey immediately prior to their scheduled outpatient procedure and before receiving sedation.Survey items included clinical pathway(direct or consult),procedure indication(cancer screening or symptom investigation),telephone and written contact from the physician endoscopist office,information sources,and pre-procedure anxiety.Participants reported pre-procedure anxiety using a 10 point scale anchored by "very relaxed"(1) and "very nervous"(10).At least three months following the procedure,patient medical records were reviewed to determine sedative dose,procedure indications and any adverse events.The primary comparison was between the direct and consult pathways.Given the very different implications,a secondary analysis considering the patient-reported indication for the procedure(symptoms or screening).Effects of pathway(direct vs consult) were compared both within and between the screening and symptom subgroups.RESULTS Of 409 patients who completed the survey,34% followed a direct pathway.Indications for colonoscopy were similar in each group.The majority of the participants were women(58%),married(61%),and internet users(81%).The most important information source was family physicians(Direct) and specialist physicians(Consult).Use of other information sources,including the internet(20% vs 18%) and Direct family and friends(64% vs 53%),was similar in the Direct and Consult groups,respectively.Only 31% of the 81% who were internet users accessed internet health information.Most sought fundamental information such as what a colonoscopy is or why it is done.Pre-procedure anxiety did not differ between care pathways.Those undergoing colonoscopy for symptoms reported greater anxiety [mean 5.3,95%CI: 5.0-5.7(10 point Likert scale)] than those for screening colonoscopy(4.3,95%CI: 3.9-4.7).CONCLUSION Procedure indication(cancer screening or symptom investigation) was more closely associated with information seeking behaviors and pre-procedure anxiety than care pathway.
文摘 Carotid artery stenting (CAS) is an alternative treatment for patients with severe carotid artery stenosis, especially those with prohibitively high surgical risks.……
基金Supported by the National Natural Science Foundation of China(No.31370373 and 21102084)Natural Science Foundation of Hubei Province(No.2012FKC14401)
文摘Two 1-methyl-1H-benzo[d]imidazole derivatives, C_(18)H_(14)CuN_4O_4·C_4H_8O_2(1) and C_9H_9N_3O(2), have been synthesized and characterized by NMR, MS, FT-IR, elementary analysis and X-ray single-crystal diffraction. Compound 1 crystallizes in monoclinic, space group P2_1/n with a = 9.6888(3), b = 7.3772(2), c = 14.3277(4)A, β = 95.819(3)°, V = 1018.81(5) A3, M_r = 501.98, Z = 2, D_c = 1.636 g/cm^3, F(000) = 518, μ = 1.123 mm^(-1), MoKα radiation(λ = 0.71073 A), the final R = 0.0325 and wR = 0.0859 for 1821 observed reflections with I 〉 2σ(I). Compound 2 crystallizes in monoclinic, space group C2/c with a = 14.2908(14), b = 14.4268(13), c = 8.4802(6) A, β = 108.513(9)o, V = 1657.9(3) A3, M_r = 175.19, Z = 8, D_c = 1.404 g/cm^3, F(000) = 736, μ = 0.097 mm^(-1), Mo Kα radiation(λ = 0.71073 A), the final R = 0.0563 and wR = 0.1531 for 1231 observed reflections with I 〉 2σ(I). Intermolecular(N-H···N, N-H···O) and intramolecular(N-H···N, C-H···O) hydrogen bonds, as well as C-H···π and π-π stacking interactions, help to stabilize the crystal structure of compound 2.
文摘In this paper, direct sequence spread spectrum multiple access (DS/SSMA) communication system employing serially concatenated trellis coded modulation (TCM) and continuous phase modulation (CPM) over flat Rayleigh fading channel are presented. The performance of this concatenated TCM/CPM DS/SSMA system is exploited by the theoretical analysis and numerical simulations. The results demonstrate that significant improvements in error probability of this DS/SSMA system over the system with single TCM or CPM of different modulation indices can be achieved under the same conditions.
基金Supported by the National Natural Science Foundation of China(No.60872041,61072066)Fundamental Research Funds for the Central Universities(JYI0000903001,JYI0000901034)
文摘To resolve the problem of quantitative analysis in hybrid cloud,a quantitative analysis method,which is based on the security entropy,is proposed.Firstly,according to the information theory,the security entropy is put forward to calculate the uncertainty of the system' s determinations on the irregular access behaviors.Secondly,based on the security entropy,security theorems of hybrid cloud are defined.Finally,typical access control models are analyzed by the method,the method's practicability is validated,and security and applicability of these models are compared.Simulation results prove that the proposed method is suitable for the security quantitative analysis of the access control model and evaluation to access control capability in hybrid cloud.
文摘A novel PCI Express (peripheral component interconnection express) direct memory access (DMA) transaction method using bridge chip PEX 8311 is proposed. Furthermore, a new method on optimizing PC1 Express DMA transaction through improving both bus-efficiency and DMA-effieiency is presented. A finite state machine (FSM) responding for data and address cycles on PCI Express bus is introduced, and a continuous data burst is realized, which greatly promote bus-efficiency. In software design, a driver framework based on Windows driver model (WDM) and three DMA optimizing options for the proposed PCI Express interface are presented to improve DMA-efficiency. Experiments show that both read and write hardware transaction speed in this paper exceed PCI theoretical maximum speed (133 MBytes/s).
文摘High speed data communication between digital signal processor and the host is required to meet the demand of most real-time systems. PCI bus technology is a solution of this problem. The principle of data communication based on PCI has been explained. Meanwhile, the technology of data transfer between synchronous dynamic RAM(SDRAM) and an mapping space of on-chip memory(L2) by expansion direct memory access(EDMA) has also been realized.
基金Supported by the NSFC (National Natural Science Foundation of China)the 863 Program (2006AA1332)ERIPKU, the Program for New Century Excellent Talents in University.
文摘The processing speed of the communication between nodes in a parallel processor has become the major bottleneck of the processor's performance.RDMA(Remote Direct Memory Access) technology has drawn more attention recently due to its capability of transferring a larger amount of data, higher speed and reliability.4DSP(4 Digital Signal Processing) module comprised of Tiger-SHARC201 chip is connected by LVDS(Low Voltage Differential Signal) circuits.This paper proposes a general and reconfigurable RDMA platform and its corresponding communication protocol with all the routes linked based on the zero copy.The protocol transfers message of DSP by interrupting of DMA and is applied on massive remote image impression, which reduces memory needs and working burden of CPU.The experiment results show this platform is efficient, flexible, and expandable of being integrated to a larger scale in the next development stages.
基金Partially supported by the National Natural Science Foundation of China (No.60272071)the Research Fund for Doctoral Program of Higher Education of China (No.20020698024 & 20030698027).
文摘This paper proposes a closed-form joint space-time channel and Direction Of Arrival (DOA) blind estimation algorithm for space-thne coded Multi-Carrier Code Division Multiple Access (MC-CDMA) systems equipped with a Uniform Linear Array (ULA) at the base station in frequency-selective fading environments. The algorithm uses an ESPRIT-like method to separate multiple co-channel users with different impinging DOAs. As a result, the DOAs for multiple users are obtained. In particular, a set of signal subspaces, every one of which is spanned by the space-time vector channels of an individual user, are also obtained. From these signal subspaces, the space-time channels of multiple users are estimated using the subspace method. Computer simulations illustrate both the validity and the overall performance of the proposed scheme.
基金Supported by University Natural Science Research Pro-ject of Jiangsu (No.03KJB510088)National Natural Science Foundation of China (No.60572130).
文摘In Direct Sequence Code Division Multiple Access (DS-CDMA) systems,the chip wave-form affects the implementation,system bandwidth,envelope uniformity,eye pattern and Multiple user Access Interference (MAI). In this paper,based on an elementary density function of a second order polynomial,a class of second order continuity pulses is proposed. From this class of pulses,we can find some members having faster decaying rate,bigger eye opening,more uniform envelope and stronger anti-MAI capability than the Nyquist waveform. The normalized-bandwidth-pulse-shape-factor product,the decaying rate of the tail of the time waveform,the opening of the eye diagram,and the envelope uniformity of the second order continuity pulses are addressed in the paper that provide the basic information for the selection of the chip pulse for CDMA systems.
基金Supported by the National Natural Science Foundation of China (No.60472104).
文摘This paper proposes Steepest Decreasing Constant Modulus Algorithm (SDCMA) detection in frequency dornain for MultiCarrier Direct Sequence-Code Division Multiple Access (MC DS-CDMA) systems. The proposed algorithm is used to equalize independent ladings of all subcarriers. At the same time we compare the SDCMA blind detection with subspace-based Minimum Mean-Squared Error (MMSE) detection. The simulation results show that the pertbrmance of SDCMA blind detection is superior to that of subspace-based MMSE detection and the complexity of the former is much lower than that of the latter.
文摘In this paper, the complexity and performance of the Auxiliary Vector (AV) based reduced-rank filtering are addressed. The AV filters presented in the previous papers have the general form of the sum of the signature vector of the desired signal and a set of weighted AVs,which can be classified as three categories according to the orthogonality of their AVs and the optimality of the weight coefficients of the AVs. The AV filter with orthogonal AVs and optimal weight coefficients has the best performance, but requires considerable computational complexity and suffers from the numerical unstable operation. In order to reduce its computational load while keeping the superior performance, several low complexity algorithms are proposed to efficiently calculate the AVs and their weight coefficients. The diagonal loading technique is also introduced to solve the numerical unstability problem without complexity increase. The performance of the three types of AV filters is also compared through their application to Direct Sequence Code Division Multiple Access (DS-CDM A) systems for interference suppression.
基金supported in part by the U.S.National Science Foundation under Grant No.CCF-2132049,a Google Research Award,and a Meta Faculty Research Awardthe Expanse cluster at SDSC(San Diego Supercomputer Center)through allocation CIS210053 from the Advanced Cyberinfrastructure Coordination Ecosystem:Services&Support(ACCESS)program,which is supported by the U.S.National Science Foundation under Grant Nos.2138259,2138286,2138307,2137603,and 2138296.
文摘Machine learning techniques have become ubiquitous both in industry and academic applications.Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches.Collective communications greatly simplify inter-and intra-node data transfer and are an essential part of the distributed training process as information such as gradients must be shared between processing nodes.In this paper,we survey the current state-of-the-art collective communication libraries(namely xCCL,including NCCL,oneCCL,RCCL,MSCCL,ACCL,and Gloo),with a focus on the industry-led ones for deep learning workloads.We investigate the design features of these xCCLs,discuss their use cases in the industry deep learning workloads,compare their performance with industry-made benchmarks(i.e.,NCCL Tests and PARAM),and discuss key take-aways and interesting observations.We believe our survey sheds light on potential research directions of future designs for xCCLs.
文摘The performance of online analytical processing (OLAP) is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage, and join indexes focus on high perfor- mance storage media, efficient storage models, and reduced query processing. While they effectively perform OLAP applications, there is a vital limitation: main- memory database based OLAP (MMOLAP) cannot provide high performance for a large size data set. In this paper, we propose a novel memory dimension table model, in which the primary keys of the dimension table can be directly mapped to dimensional tuple addresses. To achieve higher performance of dimensional tuple access, we optimize our storage model for dimension tables based on OLAP query workload features. We present directly dimensional tuple accessing (DDTA) based join (DDTA- JOIN), a technique to optimize query processing on the memory dimension table by direct dimensional tuple access. We also contribute by proposing an optimization of the predicate tree to shorten predicate operation length by pruning useless predicate processing. Our experimental results show that the DDTA-JOIN algorithm is superior to both simulated row-store main memory query processing and the open-source column-store main memory database MonetDB, thanks to the reduced join cost and simple yet efficient query processing.
基金This work was supported by the Key-Area Research and Development Program of Guangdong Province of China under Grant No.2020B0101390001the National Natural Science Foundation of China under Grant Nos.61772265 and 62072228the Fundamental Research Funds for the Central Universities of China,the Collaborative Innovation Center of Novel Software Technology and Industrialization of Jiangsu Province of China,and the Jiangsu Innovation and Entrepreneurship(Shuangchuang)Program of China.
文摘Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.
基金Project supported by the Second Brain Korea 21 Project and Samsung Electronics
文摘In this paper, we propose a fast and simple system emulator, called a system performance emulator(SPE), to evaluate long read operations.The SPE estimates how much system-wide performance is enhanced by using a faster solid state disk(SSD).By suspending a CPU for a certain time during direct memory access(DMA) transfer and subtracting this suspended time from the total DMA time, the SPE estimates the improvement in system performance expected from an enhanced SSD prior to its manufacture.We also examine the relation between storage performance and system performance using the SPE.