In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
While moving towards a low-carbon, sustainable electricity system, distribution networks are expected to host a large share of distributed generators, such as photovoltaic units and wind turbines. These inverter-based...While moving towards a low-carbon, sustainable electricity system, distribution networks are expected to host a large share of distributed generators, such as photovoltaic units and wind turbines. These inverter-based resources are intermittent, but also controllable, and are expected to amplify the role of distribution networks together with other distributed energy resources, such as storage systems and controllable loads. The available control methods for these resources are typically categorized based on the available communication network into centralized, distributed, and decentralized or local. Standard local schemes are typically inefficient, whereas centralized approaches show implementation and cost concerns. This paper focuses on optimized decentralized control of distributed generators via supervised and reinforcement learning. We present existing state-of-the-art decentralized control schemes based on supervised learning, propose a new reinforcement learning scheme based on deep deterministic policy gradient, and compare the behavior of both decentralized and centralized methods in terms of computational effort, scalability, privacy awareness, ability to consider constraints, and overall optimality. We evaluate the performance of the examined schemes on a benchmark European low voltage test system. The results show that both supervised learning and reinforcement learning schemes effectively mitigate the operational issues faced by the distribution network.展开更多
The current model of economic growth generated unprecedented increases in human wealth and prosperity during the 19th and 2Oth centuries. The main mechanisms have been the rapid pace of technological and social innova...The current model of economic growth generated unprecedented increases in human wealth and prosperity during the 19th and 2Oth centuries. The main mechanisms have been the rapid pace of technological and social innovation, human capital accumulation, and the conversion of resources and natural capital into more valuable forms of produced capital. However, there is evidence emerging that this model may be approaching environmental limits and planetary boundaries, and that the conversion of natural capital needs to slow down rapidly and then be reversed Some commentators have asserted that in order for this to occur, we will need to stop growing altogether and, instead, seek prosperity without growth. Others argue that environmental concerns are low-priority luxuries to be contemplated once global growth has properly returned to levels observed prior to the 2008 financial crisis. A third group argues that there is no trade-off and, instead,, promotes green growth: the (politically appealing) idea is that we can simultaneously grow and address our environmental problems. This paper provides a critical perspective on this debate and suggests that asubstantial researc'h agenda is required to come to grips with these challenges. One place to start is with the relevant metrics: measures of per-capitawealth, and, eventually, quantitative measures of prosperity, alongside a dashboard of other sustainability indicators. A public andpoliticalfocus on wealth (a stock), and its annual changes, could realistically complement the current focus on market-based gross output as measured by GDP (a flow). This could have important policy implications, but deeper changes to governance and business models will be required.展开更多
This paper proposes a privacy-preserving algorithm to solve the average-consensus problem based on Shamir's secret shar-ing scheme,in which a network of agents reach an agreement on their states without exposing t...This paper proposes a privacy-preserving algorithm to solve the average-consensus problem based on Shamir's secret shar-ing scheme,in which a network of agents reach an agreement on their states without exposing their individual states until an areement is reached.Unlike other methods,the proposed algoritm renders the network resitant to the cllusion of any given number of nighbors(even with all nighbor'clluling).Another virtue of this work is that such a method can protect the network consensus procedure from eavesdropping.展开更多
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
文摘While moving towards a low-carbon, sustainable electricity system, distribution networks are expected to host a large share of distributed generators, such as photovoltaic units and wind turbines. These inverter-based resources are intermittent, but also controllable, and are expected to amplify the role of distribution networks together with other distributed energy resources, such as storage systems and controllable loads. The available control methods for these resources are typically categorized based on the available communication network into centralized, distributed, and decentralized or local. Standard local schemes are typically inefficient, whereas centralized approaches show implementation and cost concerns. This paper focuses on optimized decentralized control of distributed generators via supervised and reinforcement learning. We present existing state-of-the-art decentralized control schemes based on supervised learning, propose a new reinforcement learning scheme based on deep deterministic policy gradient, and compare the behavior of both decentralized and centralized methods in terms of computational effort, scalability, privacy awareness, ability to consider constraints, and overall optimality. We evaluate the performance of the examined schemes on a benchmark European low voltage test system. The results show that both supervised learning and reinforcement learning schemes effectively mitigate the operational issues faced by the distribution network.
基金the Institute for New Economic Thinking for supportthe Grantham Foundation for the Protection of the Environmentthe UK Economic and Social Research Council(ESRC) through the Centre for Climate Change Economics and Policy
文摘The current model of economic growth generated unprecedented increases in human wealth and prosperity during the 19th and 2Oth centuries. The main mechanisms have been the rapid pace of technological and social innovation, human capital accumulation, and the conversion of resources and natural capital into more valuable forms of produced capital. However, there is evidence emerging that this model may be approaching environmental limits and planetary boundaries, and that the conversion of natural capital needs to slow down rapidly and then be reversed Some commentators have asserted that in order for this to occur, we will need to stop growing altogether and, instead, seek prosperity without growth. Others argue that environmental concerns are low-priority luxuries to be contemplated once global growth has properly returned to levels observed prior to the 2008 financial crisis. A third group argues that there is no trade-off and, instead,, promotes green growth: the (politically appealing) idea is that we can simultaneously grow and address our environmental problems. This paper provides a critical perspective on this debate and suggests that asubstantial researc'h agenda is required to come to grips with these challenges. One place to start is with the relevant metrics: measures of per-capitawealth, and, eventually, quantitative measures of prosperity, alongside a dashboard of other sustainability indicators. A public andpoliticalfocus on wealth (a stock), and its annual changes, could realistically complement the current focus on market-based gross output as measured by GDP (a flow). This could have important policy implications, but deeper changes to governance and business models will be required.
文摘This paper proposes a privacy-preserving algorithm to solve the average-consensus problem based on Shamir's secret shar-ing scheme,in which a network of agents reach an agreement on their states without exposing their individual states until an areement is reached.Unlike other methods,the proposed algoritm renders the network resitant to the cllusion of any given number of nighbors(even with all nighbor'clluling).Another virtue of this work is that such a method can protect the network consensus procedure from eavesdropping.