The author constructed a transformer Scp4 of functional programs. The transformer uses the technology known as Turchin's supercompilation. Scp4 was implemented in a functional language Refal 5. The input language...The author constructed a transformer Scp4 of functional programs. The transformer uses the technology known as Turchin's supercompilation. Scp4 was implemented in a functional language Refal 5. The input language for Scp4 is also Refal 5. In the present paper we give an outline of the supercompiler and consider a number of tools of the transformer in details. The tools can be formally thought of as quasi distributive laws.展开更多
Generalized Partial Computation (GPC) is a program transformation method utilizing partial information about input data, properties of auxiliary functions and the logical structure of a source program. GPC uses both a...Generalized Partial Computation (GPC) is a program transformation method utilizing partial information about input data, properties of auxiliary functions and the logical structure of a source program. GPC uses both an inference engine such as a theorem prover and a classical partial evaluator to optimize programs. Therefore, GPC is more powerful than classical partial evaluators but harder to implement and control. We have implemented an experimental GPC system called WSDFU (Waseda Simplify Distribute Fold Unfold). This paper discusses the power of the program transformation system, its theorem prover and future works.展开更多
Aiming at the problem that the total pressure loss of the flue of the electric precipitator of the 350 MW unit of a power plant to the inlet of the draft fan is too large,the numerical simulation software Fluent and t...Aiming at the problem that the total pressure loss of the flue of the electric precipitator of the 350 MW unit of a power plant to the inlet of the draft fan is too large,the numerical simulation software Fluent and the standard k-εmodel was used to simulate the flue,the results show that the main part of the flue mean total pressure loss is derived from the confluence header and elbow.In order to reduce the loss and consider the cost of transformation,the concept of twodimensional feature surface is established,gradually proposed three sets of flue transformation program and analysis of the flue transformation program drag reduction effect,the results show that the total reduction of the flue can be reduced from 486 Pa to 89 Pa and the reduction rate is 81.7%,which is the best solution;The concept of two-dimensional feature plane is helpful for quick condensing of flue;Double V-type structure of the convergence of the box has a better drag reduction effect.展开更多
The emerging integrated CPU-GPU architectures facilitate short computational kernels to utilize GPU acceleration. Evidence has shown that, on such systems, the GPU control responsiveness (how soon the host program fi...The emerging integrated CPU-GPU architectures facilitate short computational kernels to utilize GPU acceleration. Evidence has shown that, on such systems, the GPU control responsiveness (how soon the host program finds out about the completion of a GPU kernel) is essential for the overall performance. This study identifies the GPU responsiveness dilemma: host busy polling responds quickly, but at the expense of high energy consumption and interference with co-running CPU programs; interrupt-based notification minimizes energy and CPU interference costs, but suffers from substantial response delay. We present a programlevel solution that wakes up the host program in anticipation of GPU kernel completion. We systematically explore the design space of an anticipatory wakeup scheme through a timerdelayed wakeup or kernel splitting-based pre-completion notification. Experiments show that our proposed technique can achieve the best of both worlds, high responsiveness with low power and CPU costs, for a wide range of GPU workloads.展开更多
Structures using constructors are of ordinary use in functional programming to represent data structures of unbound size. Lack of associativity of constructors, however, hinders program analyses or efficient execution...Structures using constructors are of ordinary use in functional programming to represent data structures of unbound size. Lack of associativity of constructors, however, hinders program analyses or efficient executions. This paper describes ideas of abstraction toward constructors, and similarly abstraction from constructing functions, which we call functional constructors. We demonstrate our ideas making program analyses easier and enable transformation to efficient execution.展开更多
We survey fundamental concepts for inverse programming and then present the Universal Resolving Algorithm, an algorithm for inverse computation in a first order, functional programming language. We discuss the key co...We survey fundamental concepts for inverse programming and then present the Universal Resolving Algorithm, an algorithm for inverse computation in a first order, functional programming language. We discuss the key concepts of the algorithm, including a three step approach based on the notion of a perfect process tree, and demonstrate our implementation with several examples of inverse computation.展开更多
The paradigm of disjunctive logic programming (DLP) enhances greatly the expressive power of normal logic programming (NLP) and many (declarative) semantics have beeu defined for DLP to cope with various problems of ...The paradigm of disjunctive logic programming (DLP) enhances greatly the expressive power of normal logic programming (NLP) and many (declarative) semantics have beeu defined for DLP to cope with various problems of knowledge representation in artificial intelligence. However, the expressive ability of the semantics and the soundness of program transformations for DLP have been rarely explored. This paper defines an immediate consequence operator TGP for each disjunctive program and shows that TGP has the least and computable fixpoint Lft(P). Lft is, in fact, a program transformation for DLP which transforms all disjunctive programs into negative programs. It is shown that Lft preserves many key semantics, including the disjunctive stable models, well-founded model, disjunctive argument semantics DAS, three-valued models, etc. This means that every disjunctive program P has a unique canonical form Lft(P) with respect to these semanics. As a result, the work in this paper provides a unifying frameword for studying the expressive ability of various semantics for DLP.On the other hand, the computing of the above semantics for negative programs is just a trivial task, therefore, Lft(P) is also an optimization method for DLP. Another application of Lft is to derive some interesting semantic results for DLP.展开更多
Recent years have witnessed a processor develop- ment trend that integrates central processing unit (CPU) and graphic processing unit (GPU) into a single chip. The inte- gration helps to save some host-device data...Recent years have witnessed a processor develop- ment trend that integrates central processing unit (CPU) and graphic processing unit (GPU) into a single chip. The inte- gration helps to save some host-device data copying that a discrete GPU usually requires, but also introduces deep re- source sharing and possible interference between CPU and GPU. This work investigates the performance implications of independently co-running CPU and GPU programs on these platforms. First, we perform a comprehensive measurement that covers a wide variety of factors, including processor ar- chitectures, operating systems, benchmarks, timing mecha- nisms, inputs, and power management schemes. These mea- surements reveal a number of surprising observations. We an- alyze these observations and produce a list of novel insights, including the important roles of operating system (OS) con- text switching and power management in determining the program performance, and the subtle effect of CPU-GPU data copying. Finally, we confirm those insights through case studies, and point out some promising directions to mitigate anomalous performance degradation on integrated heteroge- neous processors.展开更多
This paper studies an investment and consumption problem with stochastic interest rate,where interest rate is governed by the Vasicek model.The financial market is composed of one riskfree asset and one risky asset,in...This paper studies an investment and consumption problem with stochastic interest rate,where interest rate is governed by the Vasicek model.The financial market is composed of one riskfree asset and one risky asset,in which stock price dynamics is assumed to be generally correlated with interest rate dynamics.The aim is to maximize expected utility of consumption and terminal wealth in the finite horizon.Legendre transform is used to deal with this investment and consumption problem and the explicit solutions of the optimal investment and consumption strategies with power and logarithm preference are achieved.Finally,the authors add a numerical example to analyze the effect of market parameters on the optimal investment and consumption strategy and provide some economic implications.展开更多
文摘The author constructed a transformer Scp4 of functional programs. The transformer uses the technology known as Turchin's supercompilation. Scp4 was implemented in a functional language Refal 5. The input language for Scp4 is also Refal 5. In the present paper we give an outline of the supercompiler and consider a number of tools of the transformer in details. The tools can be formally thought of as quasi distributive laws.
文摘Generalized Partial Computation (GPC) is a program transformation method utilizing partial information about input data, properties of auxiliary functions and the logical structure of a source program. GPC uses both an inference engine such as a theorem prover and a classical partial evaluator to optimize programs. Therefore, GPC is more powerful than classical partial evaluators but harder to implement and control. We have implemented an experimental GPC system called WSDFU (Waseda Simplify Distribute Fold Unfold). This paper discusses the power of the program transformation system, its theorem prover and future works.
文摘Aiming at the problem that the total pressure loss of the flue of the electric precipitator of the 350 MW unit of a power plant to the inlet of the draft fan is too large,the numerical simulation software Fluent and the standard k-εmodel was used to simulate the flue,the results show that the main part of the flue mean total pressure loss is derived from the confluence header and elbow.In order to reduce the loss and consider the cost of transformation,the concept of twodimensional feature surface is established,gradually proposed three sets of flue transformation program and analysis of the flue transformation program drag reduction effect,the results show that the total reduction of the flue can be reduced from 486 Pa to 89 Pa and the reduction rate is 81.7%,which is the best solution;The concept of two-dimensional feature plane is helpful for quick condensing of flue;Double V-type structure of the convergence of the box has a better drag reduction effect.
基金We thank the constructive comments from the anonymous referees. This material is based upon work supported by DOE Early Career Award (DE-SC0013700), the National Science Foundation (NSF) (1455404, 1455733 (CAREER), 1525609, 1464216, and 1618912). This work is also supported partly by the National Natural Science Foundation of China (NSFC) (Grant Nos. 61272143, 61272144, 61472431), and National Science and Technology Major Project (NSTMP) (2017ZX01028-101 ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DOE, NSF, NSFC or NSTMP.
文摘The emerging integrated CPU-GPU architectures facilitate short computational kernels to utilize GPU acceleration. Evidence has shown that, on such systems, the GPU control responsiveness (how soon the host program finds out about the completion of a GPU kernel) is essential for the overall performance. This study identifies the GPU responsiveness dilemma: host busy polling responds quickly, but at the expense of high energy consumption and interference with co-running CPU programs; interrupt-based notification minimizes energy and CPU interference costs, but suffers from substantial response delay. We present a programlevel solution that wakes up the host program in anticipation of GPU kernel completion. We systematically explore the design space of an anticipatory wakeup scheme through a timerdelayed wakeup or kernel splitting-based pre-completion notification. Experiments show that our proposed technique can achieve the best of both worlds, high responsiveness with low power and CPU costs, for a wide range of GPU workloads.
基金Supported by Research Fellowships of Japan Society for the Promotion of Science for Young Scientists(11-0 6 2 82 )
文摘Structures using constructors are of ordinary use in functional programming to represent data structures of unbound size. Lack of associativity of constructors, however, hinders program analyses or efficient executions. This paper describes ideas of abstraction toward constructors, and similarly abstraction from constructing functions, which we call functional constructors. We demonstrate our ideas making program analyses easier and enable transformation to efficient execution.
文摘We survey fundamental concepts for inverse programming and then present the Universal Resolving Algorithm, an algorithm for inverse computation in a first order, functional programming language. We discuss the key concepts of the algorithm, including a three step approach based on the notion of a perfect process tree, and demonstrate our implementation with several examples of inverse computation.
文摘The paradigm of disjunctive logic programming (DLP) enhances greatly the expressive power of normal logic programming (NLP) and many (declarative) semantics have beeu defined for DLP to cope with various problems of knowledge representation in artificial intelligence. However, the expressive ability of the semantics and the soundness of program transformations for DLP have been rarely explored. This paper defines an immediate consequence operator TGP for each disjunctive program and shows that TGP has the least and computable fixpoint Lft(P). Lft is, in fact, a program transformation for DLP which transforms all disjunctive programs into negative programs. It is shown that Lft preserves many key semantics, including the disjunctive stable models, well-founded model, disjunctive argument semantics DAS, three-valued models, etc. This means that every disjunctive program P has a unique canonical form Lft(P) with respect to these semanics. As a result, the work in this paper provides a unifying frameword for studying the expressive ability of various semantics for DLP.On the other hand, the computing of the above semantics for negative programs is just a trivial task, therefore, Lft(P) is also an optimization method for DLP. Another application of Lft is to derive some interesting semantic results for DLP.
基金We thank the constructive comments from the anony- mous referees. This material was based upon work supported by DOE Early Career Award, the National Science Foundation (NSF) (1455404 and 1525609), and NSF CAREER Award. This work is also supported partly by the NSF (CNS-1217372, CNS-1239423, CCF-1255729, CNS-1319353, and CNS-1319417) and the National Natural Science Foundation of China (NSFC) (Grant Nos. 61272143, 61272144, and 61472431). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DOE, NSF, or NSFC.
文摘Recent years have witnessed a processor develop- ment trend that integrates central processing unit (CPU) and graphic processing unit (GPU) into a single chip. The inte- gration helps to save some host-device data copying that a discrete GPU usually requires, but also introduces deep re- source sharing and possible interference between CPU and GPU. This work investigates the performance implications of independently co-running CPU and GPU programs on these platforms. First, we perform a comprehensive measurement that covers a wide variety of factors, including processor ar- chitectures, operating systems, benchmarks, timing mecha- nisms, inputs, and power management schemes. These mea- surements reveal a number of surprising observations. We an- alyze these observations and produce a list of novel insights, including the important roles of operating system (OS) con- text switching and power management in determining the program performance, and the subtle effect of CPU-GPU data copying. Finally, we confirm those insights through case studies, and point out some promising directions to mitigate anomalous performance degradation on integrated heteroge- neous processors.
基金supported by the Humanities and Social Science Research Youth Foundation of Ministry of Education of China under Grant No.11YJC790006Center for Research of Regulation and Policy of Zhejiang Province of China under Grant No.13JDGZ03YB+1 种基金the project of National Statistical Science of China under Grant No.2013LY125the Higher School Science and Technology Development Foundation of Tianjin of China under Grant No.20100821
文摘This paper studies an investment and consumption problem with stochastic interest rate,where interest rate is governed by the Vasicek model.The financial market is composed of one riskfree asset and one risky asset,in which stock price dynamics is assumed to be generally correlated with interest rate dynamics.The aim is to maximize expected utility of consumption and terminal wealth in the finite horizon.Legendre transform is used to deal with this investment and consumption problem and the explicit solutions of the optimal investment and consumption strategies with power and logarithm preference are achieved.Finally,the authors add a numerical example to analyze the effect of market parameters on the optimal investment and consumption strategy and provide some economic implications.