To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is per...To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is perfect for unit 1 and only mechanical service for unit 2 in the model. PM activity is randomly performed according to a dynamic PM plan distributed in each implementation period. A replacement is determined based on the competing results of unplanned and planned replacements. The unplanned replacement is trigged by a catastrophic failure of unit 2, and the planned replacement is executed when the PM number reaches the threshold N. Through modeling and analysis, a solution algorithm for an optimal implementation period and the PM number is given, and optimal process and parametric sensitivity are provided by a numerical example. Results show that the implementation period should be decreased as soon as possible under the condition of meeting the needs of practice, which can increase mean operating time and decrease the long-run cost rate.展开更多
This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. Th...This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. The complexity of this system is also analyzed. Moreover, the optimal harvesting policy are given for the inshore subpopulation, which includes the maximum sustainable yield and the corresponding harvesting effort.展开更多
This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted...This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted dividend pay-out until the time of bankruptcy and the terminal value of the company under liquidity constraint. We find the solution of this problem via solving the problem with zero terminal value. We also analyze the influence of terminal value on the optimal policy.展开更多
ARINC653 systems, which have been widely used in avionics industry, are an important class of safety-critical applications. Partitions are the core concept in the Arinc653 system architecture. Due to the existence of ...ARINC653 systems, which have been widely used in avionics industry, are an important class of safety-critical applications. Partitions are the core concept in the Arinc653 system architecture. Due to the existence of partitions, the system designer must allocate adequate time slots statically to each partition in the design phase. Although some time slot allocation policies could be borrowed from task scheduling policies, no existing literatures give an optimal allocation policy. In this paper, we present a partition configuration policy and prove that this policy is optimal in the sense that if this policy fails to configure adequate time slots to each partition, nor do other policies. Then, by simulation, we show the effects of different partition configuration policies on time slot allocation of partitions and task response time, respectively.展开更多
This paper employs a stochastic endogenous growth model extended to the case of a recursive utility function which can disentangle intertemporal substitution from risk aversion to analyze productive government expendi...This paper employs a stochastic endogenous growth model extended to the case of a recursive utility function which can disentangle intertemporal substitution from risk aversion to analyze productive government expenditure and optimal fiscal policy, particularly stresses the importance of factor income. First, the explicit solutions of the central planner's stochastic optimization problem are derived, the growth maximizing and welfare-maximizing government expenditure policies are obtained and their standing in conflict or coincidence depends upon intertemporal substitution. Second, the explicit solutions of the representative individual's stochastic optimization problem which permits to tax on capital income and labor income separately are derived ,and it is found that the effect of risk on growth crucially depends on the degree of risk aversion,the intertemporal elasticity of substitution and the capital income share. Finally, a flexible optimal tax policy which can be internally adjusted to a certain extent is derived, and it is found that the distribution of factor income plays an important role in designing the optimal tax policy.展开更多
This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple...This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple actuators, only one actuator can receive the control signal and be activated over an unfixed time interval, and the other actuators keep dormant. After incorporating a state observer into the event generator, the event-driven control loop and the minimum inter-event time are ultimately bounded. Based on the event-driven state feedback control, the time intervals of unfixed length can be obtained. The optimal switching policy is based on finite horizon linear quadratic optimal control at the beginning of each time subinterval. A simulation example demonstrate the effectiveness of the proposed policy.展开更多
This paper presents an optimal production model for manufacturer in a supply chain with a fixed demand at a fixed interval with respect to the learning effect on production capacity. An algorithm is employed to find t...This paper presents an optimal production model for manufacturer in a supply chain with a fixed demand at a fixed interval with respect to the learning effect on production capacity. An algorithm is employed to find the optimal delay time for production and production time sequentially. It is found that the optimal delay time for production and the production time are not static, but dynamic and variant with time. It is important for a manufacturer to schedule the production so as to prevent facilities and workers from idling.展开更多
This paper studies the optimal policy for joint control of admission, routing, service, and jockeying in a queueing system consisting of two exponential servers in parallel.Jobs arrive according to a Poisson process.U...This paper studies the optimal policy for joint control of admission, routing, service, and jockeying in a queueing system consisting of two exponential servers in parallel.Jobs arrive according to a Poisson process.Upon each arrival, an admission/routing decision is made, and the accepted job is routed to one of the two servers with each being associated with a queue.After each service completion, the servers have an option of serving a job from its own queue, serving a jockeying job from another queue, or staying idle.The system performance is inclusive of the revenues from accepted jobs, the costs of holding jobs in queues, the service costs and the job jockeying costs.To maximize the total expected discounted return, we formulate a Markov decision process(MDP) model for this system.The value iteration method is employed to characterize the optimal policy as a hedging point policy.Numerical studies verify the structure of the hedging point policy which is convenient for implementing control actions in practice.展开更多
To investigate the equilibrium relationships between the volatility of capital and income, taxation, and ance in a stochastic control model, the uniqueness of the solution to this model was proved by using the method ...To investigate the equilibrium relationships between the volatility of capital and income, taxation, and ance in a stochastic control model, the uniqueness of the solution to this model was proved by using the method of dynamic programming under the introduction of distributive disturbance and elastic labor supply. Furthermore, the effects of two types of shocks on labor-leisure choice, economic growth rate and welfare were numerically analyzed, and then the optimal tax policy was derived.展开更多
In this paper, the exploitation of single population modelled by Richards model is studied. By choosing the maximum annual-sustainable yield as management objective, we investigate the optimal harvesting policies for ...In this paper, the exploitation of single population modelled by Richards model is studied. By choosing the maximum annual-sustainable yield as management objective, we investigate the optimal harvesting policies for autonomous and periodic exploited Richards model. Further, when the functions in the exploited Richards model are stably bounded functions, we study the ultimately optimal harvesting policy and obtain the corresponding average limiting maximum sustainable yield.展开更多
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac...I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem.展开更多
Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical o...Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical operation,there are still some problems with federated learning applications.Blockchain has the characteristics of decentralization,distribution,and security.The blockchain-enabled federated learning further improve the security and performance of model training,while also expanding the application scope of federated learning.Blockchain has natural financial attributes that help establish a federated learning data market.However,the data of federated learning tasks may be distributed across a large number of resource-constrained IoT devices,which have different computing,communication,and storage resources,and the data quality of each device may also vary.Therefore,how to effectively select the clients with the data required for federated learning task is a research hotspot.In this paper,a two-stage client selection scheme for blockchain-enabled federated learning is proposed,which first selects clients that satisfy federated learning task through attribute-based encryption,protecting the attribute privacy of clients.Then blockchain nodes select some clients for local model aggregation by proximal policy optimization algorithm.Experiments show that the model performance of our two-stage client selection scheme is higher than that of other client selection algorithms when some clients are offline and the data quality is poor.展开更多
This paper aim is to examine the optimal pricing and return policies for false failure returns in a dual-channel supply chain. Four prevailing return policies in which a manufacturer both operates an E-shop and sells ...This paper aim is to examine the optimal pricing and return policies for false failure returns in a dual-channel supply chain. Four prevailing return policies in which a manufacturer both operates an E-shop and sells its product through a brick-and-mortar retailer are analyzed, i.e. (I) the manufacturer handlings E-shop's returns, while the retailer addresses brick-and-mortar store's returns (NR); (II) the retailer tackles the whole (both E-shop's and brick-and-mortar store's) returns (ORR); (III) the manufacturer tackles the whole returns (ORM); and (IV) the manufacturer and the retailer are jointly responsible for the whole returns (RRM). Firstly, the optimal pricing and return policies comparing these four scenarios under uniform-pricing strategy are presented. Our conclusions show that the ORR is an optimal return policy. Compared with the NR, consumers will get a lower product pricing under the ORR and a higher product pricing under the ORM. With regard to the RRM, the product pricing is depended on consumer preference, return-rates of the E-shop and the brick-and-mortar store. Then, the optimal pricing and return policies are analyzed under differential-pricing strategy by conducting two-stage sequential games between the manufacturer and the retailer. The findings show that if consumers in the market prefer to purchase via the E-shop, the ORR is an optimal return policy. Otherwise, the NR is the optimal return policy. Compared with the NR, the ORR retailer's product pricing will rely on the retailer's and the manufacturer's return-costs; the RRM retailer's product pricing will depend on the return-costs of the retailer and the manufacturer, the return-rates of the E-shop and the brick-and-mortar store and so on. Finally, the influences of the manufacturer and the retailer establishing a Buy-back contract are discussed. Our results illustrated that the Buy-back contract doesn't affect optimal pricing and return policies under both the uniform and the differential pricing strategies.展开更多
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob...Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods.展开更多
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to...The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.展开更多
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape...To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.展开更多
Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing ...Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing the problem of B-splinefitting for cam pitch curves.Considering that the B-spline curve needs to meet the motion law of the follower to approximate the pitch curve,we use the radial error to quantify the effects of thefitting B-spline curve and the pitch curve.The problem thus boils down to solving a difficult global optimization problem tofind the numbers and positions of the control points or data points of the B-spline curve such that the cumulative radial error between thefitting curve and the original curve is minimized,and this problem is attempted in this paper with a double deep Q-network(DDQN)reinforcement learning(RL)algorithm with data points traceability.Specifically,the RL envir-onment,actions set and current states set are designed to facilitate the search of the data points,along with the design of the reward function and the initialization of the neural network.The experimental results show that when the angle division value of the actions set isfixed,the proposed algorithm can maximize the number of data points of the B-spline curve,and accurately place these data points to the right positions,with the minimum average of radial errors.Our work establishes the theoretical foundation for studying splinefitting using the RL method.展开更多
In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies ...In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies in communication network,a policy-based TCoD network model is given and a comprehensive evaluation index system of the network effectiveness is put forward from both network application and handling mechanism perspectives. A TCoD network prototype system based on Asynchronous Transfer Mode/Multi-Protocol Label Switching (ATM/MPLS) is introduced and some experiments are performed on it. The prototype system is evaluated and analyzed with the comprehensive evaluation index system. The results show that the index system can be used to judge whether the communication network can meet the application requirements or not,and can provide references for the optimization of the transport policies so as to improve the communication network effectiveness.展开更多
According to this paper, the dragon-shape strategy is the optimized option of China's future strategy with respect to the geographic distribution of regional economy.
基金The National Natural Science Foundation of China(No.51275090,71201025)the Program for Special Talent in Six Fields of Jiangsu Province(No.2008144)+1 种基金the Scientific Research Foundation of Graduate School of Southeast University(No.YBJJ1302)the Scientific Innovation Research of College Graduates in Jiangsu Province(No.CXLX12_0078)
文摘To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is perfect for unit 1 and only mechanical service for unit 2 in the model. PM activity is randomly performed according to a dynamic PM plan distributed in each implementation period. A replacement is determined based on the competing results of unplanned and planned replacements. The unplanned replacement is trigged by a catastrophic failure of unit 2, and the planned replacement is executed when the PM number reaches the threshold N. Through modeling and analysis, a solution algorithm for an optimal implementation period and the PM number is given, and optimal process and parametric sensitivity are provided by a numerical example. Results show that the implementation period should be decreased as soon as possible under the condition of meeting the needs of practice, which can increase mean operating time and decrease the long-run cost rate.
文摘This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. The complexity of this system is also analyzed. Moreover, the optimal harvesting policy are given for the inshore subpopulation, which includes the maximum sustainable yield and the corresponding harvesting effort.
基金Supported by Doctor Foundation of Xinjiang Universitythe National Natural Science Foundation of China
文摘This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted dividend pay-out until the time of bankruptcy and the terminal value of the company under liquidity constraint. We find the solution of this problem via solving the problem with zero terminal value. We also analyze the influence of terminal value on the optimal policy.
基金supported by the National Natural Science Foundation of China under Grant No. 90718019the National High-Tech Research and Development Plan of China under Grant No. 2007AA010304
文摘ARINC653 systems, which have been widely used in avionics industry, are an important class of safety-critical applications. Partitions are the core concept in the Arinc653 system architecture. Due to the existence of partitions, the system designer must allocate adequate time slots statically to each partition in the design phase. Although some time slot allocation policies could be borrowed from task scheduling policies, no existing literatures give an optimal allocation policy. In this paper, we present a partition configuration policy and prove that this policy is optimal in the sense that if this policy fails to configure adequate time slots to each partition, nor do other policies. Then, by simulation, we show the effects of different partition configuration policies on time slot allocation of partitions and task response time, respectively.
文摘This paper employs a stochastic endogenous growth model extended to the case of a recursive utility function which can disentangle intertemporal substitution from risk aversion to analyze productive government expenditure and optimal fiscal policy, particularly stresses the importance of factor income. First, the explicit solutions of the central planner's stochastic optimization problem are derived, the growth maximizing and welfare-maximizing government expenditure policies are obtained and their standing in conflict or coincidence depends upon intertemporal substitution. Second, the explicit solutions of the representative individual's stochastic optimization problem which permits to tax on capital income and labor income separately are derived ,and it is found that the effect of risk on growth crucially depends on the degree of risk aversion,the intertemporal elasticity of substitution and the capital income share. Finally, a flexible optimal tax policy which can be internally adjusted to a certain extent is derived, and it is found that the distribution of factor income plays an important role in designing the optimal tax policy.
基金supported by the National Natural Science Foundation of China(Grant Nos.61174021 and 61104155)the Fundamental Research Funds for theCentral Universities,China(Grant Nos.JUDCF13037 and JUSRP51322B)+1 种基金the Programme of Introducing Talents of Discipline to Universities,China(GrantNo.B12018)the Jiangsu Innovation Program for Graduates,China(Grant No.CXZZ13-0740)
文摘This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple actuators, only one actuator can receive the control signal and be activated over an unfixed time interval, and the other actuators keep dormant. After incorporating a state observer into the event generator, the event-driven control loop and the minimum inter-event time are ultimately bounded. Based on the event-driven state feedback control, the time intervals of unfixed length can be obtained. The optimal switching policy is based on finite horizon linear quadratic optimal control at the beginning of each time subinterval. A simulation example demonstrate the effectiveness of the proposed policy.
基金Funded by National Social Sciences Fund for Young Scholar ( No.020JY027)
文摘This paper presents an optimal production model for manufacturer in a supply chain with a fixed demand at a fixed interval with respect to the learning effect on production capacity. An algorithm is employed to find the optimal delay time for production and production time sequentially. It is found that the optimal delay time for production and the production time are not static, but dynamic and variant with time. It is important for a manufacturer to schedule the production so as to prevent facilities and workers from idling.
基金supported by the National Social Science Fund of China (19BGL100)。
文摘This paper studies the optimal policy for joint control of admission, routing, service, and jockeying in a queueing system consisting of two exponential servers in parallel.Jobs arrive according to a Poisson process.Upon each arrival, an admission/routing decision is made, and the accepted job is routed to one of the two servers with each being associated with a queue.After each service completion, the servers have an option of serving a job from its own queue, serving a jockeying job from another queue, or staying idle.The system performance is inclusive of the revenues from accepted jobs, the costs of holding jobs in queues, the service costs and the job jockeying costs.To maximize the total expected discounted return, we formulate a Markov decision process(MDP) model for this system.The value iteration method is employed to characterize the optimal policy as a hedging point policy.Numerical studies verify the structure of the hedging point policy which is convenient for implementing control actions in practice.
文摘To investigate the equilibrium relationships between the volatility of capital and income, taxation, and ance in a stochastic control model, the uniqueness of the solution to this model was proved by using the method of dynamic programming under the introduction of distributive disturbance and elastic labor supply. Furthermore, the effects of two types of shocks on labor-leisure choice, economic growth rate and welfare were numerically analyzed, and then the optimal tax policy was derived.
文摘In this paper, the exploitation of single population modelled by Richards model is studied. By choosing the maximum annual-sustainable yield as management objective, we investigate the optimal harvesting policies for autonomous and periodic exploited Richards model. Further, when the functions in the exploited Richards model are stably bounded functions, we study the ultimately optimal harvesting policy and obtain the corresponding average limiting maximum sustainable yield.
文摘I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem.
文摘Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical operation,there are still some problems with federated learning applications.Blockchain has the characteristics of decentralization,distribution,and security.The blockchain-enabled federated learning further improve the security and performance of model training,while also expanding the application scope of federated learning.Blockchain has natural financial attributes that help establish a federated learning data market.However,the data of federated learning tasks may be distributed across a large number of resource-constrained IoT devices,which have different computing,communication,and storage resources,and the data quality of each device may also vary.Therefore,how to effectively select the clients with the data required for federated learning task is a research hotspot.In this paper,a two-stage client selection scheme for blockchain-enabled federated learning is proposed,which first selects clients that satisfy federated learning task through attribute-based encryption,protecting the attribute privacy of clients.Then blockchain nodes select some clients for local model aggregation by proximal policy optimization algorithm.Experiments show that the model performance of our two-stage client selection scheme is higher than that of other client selection algorithms when some clients are offline and the data quality is poor.
文摘This paper aim is to examine the optimal pricing and return policies for false failure returns in a dual-channel supply chain. Four prevailing return policies in which a manufacturer both operates an E-shop and sells its product through a brick-and-mortar retailer are analyzed, i.e. (I) the manufacturer handlings E-shop's returns, while the retailer addresses brick-and-mortar store's returns (NR); (II) the retailer tackles the whole (both E-shop's and brick-and-mortar store's) returns (ORR); (III) the manufacturer tackles the whole returns (ORM); and (IV) the manufacturer and the retailer are jointly responsible for the whole returns (RRM). Firstly, the optimal pricing and return policies comparing these four scenarios under uniform-pricing strategy are presented. Our conclusions show that the ORR is an optimal return policy. Compared with the NR, consumers will get a lower product pricing under the ORR and a higher product pricing under the ORM. With regard to the RRM, the product pricing is depended on consumer preference, return-rates of the E-shop and the brick-and-mortar store. Then, the optimal pricing and return policies are analyzed under differential-pricing strategy by conducting two-stage sequential games between the manufacturer and the retailer. The findings show that if consumers in the market prefer to purchase via the E-shop, the ORR is an optimal return policy. Otherwise, the NR is the optimal return policy. Compared with the NR, the ORR retailer's product pricing will rely on the retailer's and the manufacturer's return-costs; the RRM retailer's product pricing will depend on the return-costs of the retailer and the manufacturer, the return-rates of the E-shop and the brick-and-mortar store and so on. Finally, the influences of the manufacturer and the retailer establishing a Buy-back contract are discussed. Our results illustrated that the Buy-back contract doesn't affect optimal pricing and return policies under both the uniform and the differential pricing strategies.
基金Supported by the National Natural Science Foundation of China(71571019).
文摘Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods.
基金the Project of National Natural Science Foundation of China(Grant No.62106283)the Project of National Natural Science Foundation of China(Grant No.72001214)to provide fund for conducting experimentsthe Project of Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-484)。
文摘The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.
基金supported in part by the National Natural Science Foundation of China under grants 61901078,61771082,61871062,and U20A20157in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under grant KJQN201900609+2 种基金in part by the Natural Science Foundation of Chongqing under grant cstc2020jcyj-zdxmX0024in part by University Innovation Research Group of Chongqing under grant CXQT20017in part by the China University Industry-University-Research Collaborative Innovation Fund(Future Network Innovation Research and Application Project)under grant 2021FNA04008.
文摘To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.
基金supported by Fujian Province Nature Science Foundation under Grant No.2018J01553.
文摘Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing the problem of B-splinefitting for cam pitch curves.Considering that the B-spline curve needs to meet the motion law of the follower to approximate the pitch curve,we use the radial error to quantify the effects of thefitting B-spline curve and the pitch curve.The problem thus boils down to solving a difficult global optimization problem tofind the numbers and positions of the control points or data points of the B-spline curve such that the cumulative radial error between thefitting curve and the original curve is minimized,and this problem is attempted in this paper with a double deep Q-network(DDQN)reinforcement learning(RL)algorithm with data points traceability.Specifically,the RL envir-onment,actions set and current states set are designed to facilitate the search of the data points,along with the design of the reward function and the initialization of the neural network.The experimental results show that when the angle division value of the actions set isfixed,the proposed algorithm can maximize the number of data points of the B-spline curve,and accurately place these data points to the right positions,with the minimum average of radial errors.Our work establishes the theoretical foundation for studying splinefitting using the RL method.
基金Supported by the National 863 Program (No.2007AA-701210)
文摘In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies in communication network,a policy-based TCoD network model is given and a comprehensive evaluation index system of the network effectiveness is put forward from both network application and handling mechanism perspectives. A TCoD network prototype system based on Asynchronous Transfer Mode/Multi-Protocol Label Switching (ATM/MPLS) is introduced and some experiments are performed on it. The prototype system is evaluated and analyzed with the comprehensive evaluation index system. The results show that the index system can be used to judge whether the communication network can meet the application requirements or not,and can provide references for the optimization of the transport policies so as to improve the communication network effectiveness.
文摘According to this paper, the dragon-shape strategy is the optimized option of China's future strategy with respect to the geographic distribution of regional economy.