In this paper,we present the proximal-proximal-gradient method(PPG),a novel optimization method that is simple to implement and simple to parallelize.PPG generalizes the proximal-gradient method and ADMM and is applic...In this paper,we present the proximal-proximal-gradient method(PPG),a novel optimization method that is simple to implement and simple to parallelize.PPG generalizes the proximal-gradient method and ADMM and is applicable to minimization problems written as a sum of many differentiable and many non-differentiable convex functions.The non-differentiable functions can be coupled.We furthermore present a related stochastic variation,which we call stochastic PPG(S-PPG).S-PPG can be interpreted as a generalization of Finito and MISO over to the sum of many coupled non-differentiable convex functions.We present many applications that can benefit from PPG and S-PPG and prove convergence for both methods.We demonstrate the empirical effectiveness of both methods through experiments on a CUDA GPU.A key strength of PPG and S-PPG is,compared to existing methods,their ability to directly handle a large sum of non-differentiable nonseparable functions with a constant stepsize independent of the number of functions.Such non-diminishing stepsizes allows them to be fast.展开更多
文摘In this paper,we present the proximal-proximal-gradient method(PPG),a novel optimization method that is simple to implement and simple to parallelize.PPG generalizes the proximal-gradient method and ADMM and is applicable to minimization problems written as a sum of many differentiable and many non-differentiable convex functions.The non-differentiable functions can be coupled.We furthermore present a related stochastic variation,which we call stochastic PPG(S-PPG).S-PPG can be interpreted as a generalization of Finito and MISO over to the sum of many coupled non-differentiable convex functions.We present many applications that can benefit from PPG and S-PPG and prove convergence for both methods.We demonstrate the empirical effectiveness of both methods through experiments on a CUDA GPU.A key strength of PPG and S-PPG is,compared to existing methods,their ability to directly handle a large sum of non-differentiable nonseparable functions with a constant stepsize independent of the number of functions.Such non-diminishing stepsizes allows them to be fast.