A genetic algorithm to solve the set covering problem proposed in the literature had some improvements which gave better solutions, i.e., better chromosomes in the first starting population, taking full account of do...A genetic algorithm to solve the set covering problem proposed in the literature had some improvements which gave better solutions, i.e., better chromosomes in the first starting population, taking full account of domain specific knowledge with sound programming skill. We have further investigated the input data dependency of their genetic algorithm, i.e., the dependency on costs and density. We have found that for input problem data sets with densities greater than or equal to 3%, our genetic algorithm is still practical both in computing time and approximation ratio.展开更多
With the growing popularity of data-intensive services on the Internet, the traditional process-centric model for business process meets challenges due to the lack of abilities to describe data semantics and dependenc...With the growing popularity of data-intensive services on the Internet, the traditional process-centric model for business process meets challenges due to the lack of abilities to describe data semantics and dependencies, resulting in the inflexibility of the design and implement for the processes. This paper proposes a novel data-aware business process model which is able to describe both explicit control flow and implicit data flow. Data model with dependencies which are formulated by Linear-time Temporal Logic(LTL) is presented, and their satisfiability is validated by an automaton-based model checking algorithm. Data dependencies are fully considered in modeling phase, which helps to improve the efficiency and reliability of programming during developing phase. Finally, a prototype system based on j BPM for data-aware workflow is designed using such model, and has been deployed to Beijing Kingfore heating management system to validate the flexibility, efficacy and convenience of our approach for massive coding and large-scale system management in reality.展开更多
Classical unequal erasure protection schemes split data to be protected into classes which are encoded independently. The unequal protection scheme presented in this paper is based on an erasure code which encodes all...Classical unequal erasure protection schemes split data to be protected into classes which are encoded independently. The unequal protection scheme presented in this paper is based on an erasure code which encodes all the data together according to the existing dependencies. A simple algorithm generates dynamically the generator matrix of the erasure code according to the packets streams structure, i.e., the dependencies between the packets, and the rate of the code. This proposed erasure code was applied to a packetized MPEG4 stream transmitted over a packet erasure channel and compared with other classical protection schemes in terms of PSNR and MOS. It is shown that the proposed code allows keeping a high video quality-level in a larger packet loss rate range than the other protection schemes.展开更多
Groundwater is the water located beneath the earth's surface in the soil pore spaces and in the fractures of rock formations. As one of the most important natural resources, groundwater is associated with the environ...Groundwater is the water located beneath the earth's surface in the soil pore spaces and in the fractures of rock formations. As one of the most important natural resources, groundwater is associated with the environment, public health, welfare, and long-term economic growth, which affects the daily activities of human beings. In modern urban areas, the primary contaminants of groundwater are artificial products, such as gasoline and diesel. To protect this important water resource, a series of efforts have been exerted, including enforcement and remedial actions. Each year, the TGPC (Texas Groundwater Protection Committee) in US publishes a "Joint Groundwater Monitoring and Contamination Report" to describe historic and new contamination cases in each county, which is an important data source for the design of prevention strategies. In this paper, a DDM (data dependent modeling) approach is proposed to predict county-level NCC (new contamination cases). A case study with contamination information from Harris County in Texas was conducted to illustrate the modeling and prediction process with promising results. The one-step prediction error is 1.5%, while the two-step error is 12.1%. The established model can be used at the county-level, state-level, and even at the country-level. Besides, the prediction results could be a reference during decision-making processes.展开更多
Fisher-Tippet-Gnedenko classical theory shows that the normalized maximum of n iid random variables with distribution F belonging to a very wide class of functions, converges in law to an extremal distribution H, that...Fisher-Tippet-Gnedenko classical theory shows that the normalized maximum of n iid random variables with distribution F belonging to a very wide class of functions, converges in law to an extremal distribution H, that is determined by the tail of F. Extensions of this theory from the iid case to stationary and weak dependent sequences are well known from the work of Leadbetter, Lindgreen and Rootzén. In this paper, we present a very simple class of random processes that runs from iid sequences to non-stationary and strongly dependent processes, and we study the asymptotic behavior of its normalized maximum. More interesting, we show that when the process is strongly dependent, the asymptotic distribution is no longer an extremal one, but a mixture of extremal distributions. We present very simple theoretical and simulated examples of this result. This provides a simple framework to asymptotic approximations of extremes values not covered by classical extremal theory and its well-known extensions.展开更多
Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary ...Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary to develop a machine-learning model to detect and predict commercial flights using automatic dependent surveillance–broadcast data.This study combined data-quality detection,anomaly detection,and abnormality-classification-model development.The research methodology involved the following stages:problem statement,data selection and labeling,prediction-model development,deployment,and testing.The data labeling process was based on the rules framed by the international civil aviation organization for commercial,jet-engine flights and validated by expert commercial pilots.The results showed that the best prediction model,the quadratic-discriminant-analysis,was 93%accurate,indicating a“good fit”.Moreover,the model’s area-under-the-curve results for abnormal and normal detection were 0.97 and 0.96,respectively,thus confirming its“good fit”.展开更多
An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependenci...An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependencies in CABAC decoding process.An efficiency model of CABAC decoding pipeline is derived according to the analysis of a common pipeline.Based on that,several adaptive strategies are provided.The pipelining scheme with these strategies can be adaptive to different types of syntax elements(SEs) and the pipeline will not stall during decoding process when these strategies are adopted.In addition,the decoder proposed can fully support H.264/AVC high4:2:2 profile and the experimental results show that the efficiency of decoder is much higher than other architectures with one engine.Taking both performance and cost into consideration,our design makes a good tradeoff compared with other work and it is sufficient for HD real-time decoding.展开更多
On the software module, this paper proposes a visual specification language(VSL). Based on decomposition, the language imitates men's thinking procedure that decomposes aproblem into smaller ones, then independent...On the software module, this paper proposes a visual specification language(VSL). Based on decomposition, the language imitates men's thinking procedure that decomposes aproblem into smaller ones, then independently solves the results of every small problem to get theresult of original problem (decomposition and synthesis). Besides, the language mixes visual withspecification. With computer supporting, we can implement the software module automatically. It willgreatly improve the quality of software and raise the efficiency of software development. Thesimple definition of VSL, the principle of auto-generation, an example and the future research areintroduced.展开更多
It is a widely discussed question that where the web latency comes from. In this paper, we propose a novel chunk-level latency dependence model to give a better illustration of the web latency. Based on the fact that ...It is a widely discussed question that where the web latency comes from. In this paper, we propose a novel chunk-level latency dependence model to give a better illustration of the web latency. Based on the fact that web content is delivered in chunk sequence, and clients care more about whole page retrieval latency, this paper carries out a detailed study on how the chunk sequence and relations affect the web retrieval latency. A series of thorough experiments are also conducted and data analysis are also made. The result is useful for further study on how to reduce the web latency.展开更多
With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the ...With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the software into several features and applies the machine learning method to build defect prediction classifiers,which can estimate the software areas is clean or buggy.However,the current encoding methods are mainly based on the traditional manual features or the AST of source code.Traditional manual features are difficult to reflect the deep semantics of programs,and there is a lot of noise information in AST,which affects the expression of semantic features.To overcome the above deficiencies,we combined with the Convolutional Neural Networks(CNN)and proposed a novel compiler Intermediate Representation(IR)based program encoding method for software defect prediction(CIR-CNN).Specifically,our program encoding method is based on the compiler IR,which can eliminate a large amount of noise information in the syntax structure of the source code and facilitate the acquisition of more accurate semantic information.Secondly,with the help of data flow analysis,a Data Dependency Graph(DDG)is constructed on the compiler IR,which helps to capture the deeper semantic information of the program.Finally,we use the widely used CNN model to build a software defect prediction model,which can increase the adaptive ability of the method.To evaluate the performance of the CIR-CNN,we use seven projects from PROMISE datasets to set up comparative experiments.The experiments results show that,in WPDP,with our CIR-CNN method,the prediction accuracy was improved by 12%for the AST-encoded CNN-based model and by 20.9%for the traditional features-based LR model,respectively.And in CPDP,the AST-encoded DBNbased model was improved by 9.1%and the traditional features-based TCA+model by 19.2%,respectively.展开更多
The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take mult...The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching展开更多
To detect more attacks aiming at key security data in program behavior-based anomaly detection,the data flow properties were formulated as unary and binary relations on system call arguments.A new method named two-phr...To detect more attacks aiming at key security data in program behavior-based anomaly detection,the data flow properties were formulated as unary and binary relations on system call arguments.A new method named two-phrase analysis(2PA)is designed to analyze the efficient relation dependency,and its description as well as advantages are discussed.During the phase of static analysis,a dependency graph was constructed according to the program's data dependency graph,which was used in the phase of dynamic learning to learn specified binary relations.The constructed dependency graph only stores the information of related arguments and events,thus improves the efficiency of the learning algorithm and reduces the size of learned relation dependencies.Performance evaluations show that the new method is more efficient than existing methods.展开更多
In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical pro...In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical process, which is much more complex than in the classical case. Applications to simulated data and discussion of the obtained results are provided. This is, to the best of our knowledge, the first result providing a general goodness of fit test for non-weakly dependent data.展开更多
In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when ...In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when dealing with environmental data and there was a real need of such method. We validate our approach by means of estimation and goodness-of-fit testing over simulated data, showing an accurate performance.展开更多
Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear ...Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear data fusion algorithm for N dependent observations is derived. It is proved that the estimation error of data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of dependence of observation data for the final estimation error is presented.展开更多
In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data local...In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.展开更多
This paper presents a model for automatically parallelizing compiler based on C++ which consists of compile-time and run-time parallelizing facilities.The paper also describes a method for finding both intra-object an...This paper presents a model for automatically parallelizing compiler based on C++ which consists of compile-time and run-time parallelizing facilities.The paper also describes a method for finding both intra-object and inter-object parallelism. The parallelism detection is completely transparent to users.展开更多
This paper focuses on the influence of a misspecified covariance structure on false discoveryrate for the large-scale multiple testing problem.Specifically,we evaluate the influence on themarginal distribution of loca...This paper focuses on the influence of a misspecified covariance structure on false discoveryrate for the large-scale multiple testing problem.Specifically,we evaluate the influence on themarginal distribution of local false discovery rate statistics,which are used in many multiple testing procedures and related to Bayesian posterior probabilities.Explicit forms of the marginaldistributions under both correctly specified and incorrectly specified models are derived.TheKullback–Leibler divergence is used to quantify the influence caused by a misspecification.Several numerical examples are provided to illustrate the influence.A real spatio-temporal data onsoil humidity is discussed.展开更多
文摘A genetic algorithm to solve the set covering problem proposed in the literature had some improvements which gave better solutions, i.e., better chromosomes in the first starting population, taking full account of domain specific knowledge with sound programming skill. We have further investigated the input data dependency of their genetic algorithm, i.e., the dependency on costs and density. We have found that for input problem data sets with densities greater than or equal to 3%, our genetic algorithm is still practical both in computing time and approximation ratio.
基金supported by the National Natural Science Foundation of China (No. 61502043, No. 61132001)Beijing Natural Science Foundation (No. 4162042)BeiJing Talents Fund (No. 2015000020124G082)
文摘With the growing popularity of data-intensive services on the Internet, the traditional process-centric model for business process meets challenges due to the lack of abilities to describe data semantics and dependencies, resulting in the inflexibility of the design and implement for the processes. This paper proposes a novel data-aware business process model which is able to describe both explicit control flow and implicit data flow. Data model with dependencies which are formulated by Linear-time Temporal Logic(LTL) is presented, and their satisfiability is validated by an automaton-based model checking algorithm. Data dependencies are fully considered in modeling phase, which helps to improve the efficiency and reliability of programming during developing phase. Finally, a prototype system based on j BPM for data-aware workflow is designed using such model, and has been deployed to Beijing Kingfore heating management system to validate the flexibility, efficacy and convenience of our approach for massive coding and large-scale system management in reality.
文摘Classical unequal erasure protection schemes split data to be protected into classes which are encoded independently. The unequal protection scheme presented in this paper is based on an erasure code which encodes all the data together according to the existing dependencies. A simple algorithm generates dynamically the generator matrix of the erasure code according to the packets streams structure, i.e., the dependencies between the packets, and the rate of the code. This proposed erasure code was applied to a packetized MPEG4 stream transmitted over a packet erasure channel and compared with other classical protection schemes in terms of PSNR and MOS. It is shown that the proposed code allows keeping a high video quality-level in a larger packet loss rate range than the other protection schemes.
文摘Groundwater is the water located beneath the earth's surface in the soil pore spaces and in the fractures of rock formations. As one of the most important natural resources, groundwater is associated with the environment, public health, welfare, and long-term economic growth, which affects the daily activities of human beings. In modern urban areas, the primary contaminants of groundwater are artificial products, such as gasoline and diesel. To protect this important water resource, a series of efforts have been exerted, including enforcement and remedial actions. Each year, the TGPC (Texas Groundwater Protection Committee) in US publishes a "Joint Groundwater Monitoring and Contamination Report" to describe historic and new contamination cases in each county, which is an important data source for the design of prevention strategies. In this paper, a DDM (data dependent modeling) approach is proposed to predict county-level NCC (new contamination cases). A case study with contamination information from Harris County in Texas was conducted to illustrate the modeling and prediction process with promising results. The one-step prediction error is 1.5%, while the two-step error is 12.1%. The established model can be used at the county-level, state-level, and even at the country-level. Besides, the prediction results could be a reference during decision-making processes.
文摘Fisher-Tippet-Gnedenko classical theory shows that the normalized maximum of n iid random variables with distribution F belonging to a very wide class of functions, converges in law to an extremal distribution H, that is determined by the tail of F. Extensions of this theory from the iid case to stationary and weak dependent sequences are well known from the work of Leadbetter, Lindgreen and Rootzén. In this paper, we present a very simple class of random processes that runs from iid sequences to non-stationary and strongly dependent processes, and we study the asymptotic behavior of its normalized maximum. More interesting, we show that when the process is strongly dependent, the asymptotic distribution is no longer an extremal one, but a mixture of extremal distributions. We present very simple theoretical and simulated examples of this result. This provides a simple framework to asymptotic approximations of extremes values not covered by classical extremal theory and its well-known extensions.
文摘Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary to develop a machine-learning model to detect and predict commercial flights using automatic dependent surveillance–broadcast data.This study combined data-quality detection,anomaly detection,and abnormality-classification-model development.The research methodology involved the following stages:problem statement,data selection and labeling,prediction-model development,deployment,and testing.The data labeling process was based on the rules framed by the international civil aviation organization for commercial,jet-engine flights and validated by expert commercial pilots.The results showed that the best prediction model,the quadratic-discriminant-analysis,was 93%accurate,indicating a“good fit”.Moreover,the model’s area-under-the-curve results for abnormal and normal detection were 0.97 and 0.96,respectively,thus confirming its“good fit”.
基金Supported by the National Natural Science Foundation of China(No.61076021)the National Basic Research Program of China(No.2009CB320903)China Postdoctoral Science Foundation(No.2012M511364)
文摘An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependencies in CABAC decoding process.An efficiency model of CABAC decoding pipeline is derived according to the analysis of a common pipeline.Based on that,several adaptive strategies are provided.The pipelining scheme with these strategies can be adaptive to different types of syntax elements(SEs) and the pipeline will not stall during decoding process when these strategies are adopted.In addition,the decoder proposed can fully support H.264/AVC high4:2:2 profile and the experimental results show that the efficiency of decoder is much higher than other architectures with one engine.Taking both performance and cost into consideration,our design makes a good tradeoff compared with other work and it is sufficient for HD real-time decoding.
文摘On the software module, this paper proposes a visual specification language(VSL). Based on decomposition, the language imitates men's thinking procedure that decomposes aproblem into smaller ones, then independently solves the results of every small problem to get theresult of original problem (decomposition and synthesis). Besides, the language mixes visual withspecification. With computer supporting, we can implement the software module automatically. It willgreatly improve the quality of software and raise the efficiency of software development. Thesimple definition of VSL, the principle of auto-generation, an example and the future research areintroduced.
文摘It is a widely discussed question that where the web latency comes from. In this paper, we propose a novel chunk-level latency dependence model to give a better illustration of the web latency. Based on the fact that web content is delivered in chunk sequence, and clients care more about whole page retrieval latency, this paper carries out a detailed study on how the chunk sequence and relations affect the web retrieval latency. A series of thorough experiments are also conducted and data analysis are also made. The result is useful for further study on how to reduce the web latency.
基金This work was supported by the Universities Natural Science Research Project of Jiangsu Province under Grant 20KJB520026 and 20KJA520002the Foundation for Young Teachers of Nanjing Auditing University under Grant 19QNPY018the National Nature Science Foundation of China under Grant 71972102 and 61902189.
文摘With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the software into several features and applies the machine learning method to build defect prediction classifiers,which can estimate the software areas is clean or buggy.However,the current encoding methods are mainly based on the traditional manual features or the AST of source code.Traditional manual features are difficult to reflect the deep semantics of programs,and there is a lot of noise information in AST,which affects the expression of semantic features.To overcome the above deficiencies,we combined with the Convolutional Neural Networks(CNN)and proposed a novel compiler Intermediate Representation(IR)based program encoding method for software defect prediction(CIR-CNN).Specifically,our program encoding method is based on the compiler IR,which can eliminate a large amount of noise information in the syntax structure of the source code and facilitate the acquisition of more accurate semantic information.Secondly,with the help of data flow analysis,a Data Dependency Graph(DDG)is constructed on the compiler IR,which helps to capture the deeper semantic information of the program.Finally,we use the widely used CNN model to build a software defect prediction model,which can increase the adaptive ability of the method.To evaluate the performance of the CIR-CNN,we use seven projects from PROMISE datasets to set up comparative experiments.The experiments results show that,in WPDP,with our CIR-CNN method,the prediction accuracy was improved by 12%for the AST-encoded CNN-based model and by 20.9%for the traditional features-based LR model,respectively.And in CPDP,the AST-encoded DBNbased model was improved by 9.1%and the traditional features-based TCA+model by 19.2%,respectively.
文摘The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching
文摘To detect more attacks aiming at key security data in program behavior-based anomaly detection,the data flow properties were formulated as unary and binary relations on system call arguments.A new method named two-phrase analysis(2PA)is designed to analyze the efficient relation dependency,and its description as well as advantages are discussed.During the phase of static analysis,a dependency graph was constructed according to the program's data dependency graph,which was used in the phase of dynamic learning to learn specified binary relations.The constructed dependency graph only stores the information of related arguments and events,thus improves the efficiency of the learning algorithm and reduces the size of learned relation dependencies.Performance evaluations show that the new method is more efficient than existing methods.
文摘In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical process, which is much more complex than in the classical case. Applications to simulated data and discussion of the obtained results are provided. This is, to the best of our knowledge, the first result providing a general goodness of fit test for non-weakly dependent data.
文摘In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when dealing with environmental data and there was a real need of such method. We validate our approach by means of estimation and goodness-of-fit testing over simulated data, showing an accurate performance.
文摘Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear data fusion algorithm for N dependent observations is derived. It is proved that the estimation error of data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of dependence of observation data for the final estimation error is presented.
基金supported by National Numerical Wind tunnel project NNW2019ZT6-B18 and Guangdong Introducing Innovative&Entrepreneurial Teams under Grant No.2016ZT06D211.
文摘In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.
文摘This paper presents a model for automatically parallelizing compiler based on C++ which consists of compile-time and run-time parallelizing facilities.The paper also describes a method for finding both intra-object and inter-object parallelism. The parallelism detection is completely transparent to users.
基金This research is partially supported by National Science Foundation[grant number OIA-1301789].
文摘This paper focuses on the influence of a misspecified covariance structure on false discoveryrate for the large-scale multiple testing problem.Specifically,we evaluate the influence on themarginal distribution of local false discovery rate statistics,which are used in many multiple testing procedures and related to Bayesian posterior probabilities.Explicit forms of the marginaldistributions under both correctly specified and incorrectly specified models are derived.TheKullback–Leibler divergence is used to quantify the influence caused by a misspecification.Several numerical examples are provided to illustrate the influence.A real spatio-temporal data onsoil humidity is discussed.