This paper considers distributed stochastic optimization,in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network...This paper considers distributed stochastic optimization,in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network.Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent.However,projecting a point onto a feasible set is often expensive.The Frank-Wolfe(FW)method has well-documented merits in handling convex constraints,but existing stochastic FW algorithms are basically developed for centralized settings.In this context,the present work puts forth a distributed stochastic Frank-Wolfe solver,by judiciously combining Nesterov's momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks.It is shown that the convergence rate of the proposed algorithm is O(k^(-1/2))for convex optimization,and O(1/log_(2)(k))for nonconvex optimization.The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.展开更多
Abstract Recently a, monotone generalized directional derixrative has been introduced for Lipschitz functions. This concept has been applied to represent and optimize nonsmooth functions. The second a.pplication resul...Abstract Recently a, monotone generalized directional derixrative has been introduced for Lipschitz functions. This concept has been applied to represent and optimize nonsmooth functions. The second a.pplication result,ed relevant for parallel computing, by allowing to define minimization algorithms with high degree of inherent parallelism. The paper presents first the theoretical background, namely the notions of monotone generalized directional derivative and monotone generalized subdifferential. Then it defines the tools for the procedures, that is a necessary optimality condition and a steel>est descent direction. Therefore the minimization algorithms are outlined. Successively the used architectures and the performed numerical expertence are described, by listing and commenting the t.ested functions and the obtained results.展开更多
In this paper,we design the differentially private variants of the classical Frank-Wolfe algorithm with shuffle model in the optimization of machine learning.Under weak assumptions and the generalized linear loss(GLL)...In this paper,we design the differentially private variants of the classical Frank-Wolfe algorithm with shuffle model in the optimization of machine learning.Under weak assumptions and the generalized linear loss(GLL)structure,we propose a noisy Frank-Wolfe with shuffle model algorithm(NoisyFWS)and a noisy variance-reduced Frank-Wolfe with the shuffle model algorithm(NoisyVRFWS)by adding calibrated laplace noise under shuffling scheme in thel_(p)(p∈[1,2])-case,and study their privacy as well as utility guarantees for the H?lder smoothness GLL.In particular,the privacy guarantees are mainly achieved by using advanced composition and privacy amplification by shuffling.The utility bounds of the Noisy FWS and NoisyVRFWS are analyzed and obtained the optimal excess population risksO(n-(1+α/4α+log(d)√log(1/δ)/n∈and O(n-1+α/4α+log(d)√log1(+δ)/n^(2)∈with gradient complexity O(n(1+α)^(2)/4α^(2)forα∈[1/√3,1].It turns out that the risk rates under shuffling scheme are a nearly-dimension independent rate,which is consistent with the previous work in some cases.In addition,there is a vital tradeoff between(α,L)-Holder smoothness GLL and the gradient complexity.The linear gradient complexity O(n)is showed by the parameterα=1.展开更多
This paper modifies the Frank-Wolfe's algorithm. Under weaker conditions it proves that the modified algorithm is convergent, and specially under the assumption of convexity of the objective function that without...This paper modifies the Frank-Wolfe's algorithm. Under weaker conditions it proves that the modified algorithm is convergent, and specially under the assumption of convexity of the objective function that without assuming {x ̄k} is bounded.展开更多
基金supported in part by the National Key R&D Program of China(2021YFB1714800)the National Natural Science Foundation of China(62222303,62073035,62173034,61925303,62088101,61873033)+1 种基金the CAAI-Huawei MindSpore Open Fundthe Chongqing Natural Science Foundation(2021ZX4100027)。
文摘This paper considers distributed stochastic optimization,in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network.Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent.However,projecting a point onto a feasible set is often expensive.The Frank-Wolfe(FW)method has well-documented merits in handling convex constraints,but existing stochastic FW algorithms are basically developed for centralized settings.In this context,the present work puts forth a distributed stochastic Frank-Wolfe solver,by judiciously combining Nesterov's momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks.It is shown that the convergence rate of the proposed algorithm is O(k^(-1/2))for convex optimization,and O(1/log_(2)(k))for nonconvex optimization.The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.
文摘Abstract Recently a, monotone generalized directional derixrative has been introduced for Lipschitz functions. This concept has been applied to represent and optimize nonsmooth functions. The second a.pplication result,ed relevant for parallel computing, by allowing to define minimization algorithms with high degree of inherent parallelism. The paper presents first the theoretical background, namely the notions of monotone generalized directional derivative and monotone generalized subdifferential. Then it defines the tools for the procedures, that is a necessary optimality condition and a steel>est descent direction. Therefore the minimization algorithms are outlined. Successively the used architectures and the performed numerical expertence are described, by listing and commenting the t.ested functions and the obtained results.
基金supported by the National Natural Science Foundation of China(No.U1811461,12326615)。
文摘In this paper,we design the differentially private variants of the classical Frank-Wolfe algorithm with shuffle model in the optimization of machine learning.Under weak assumptions and the generalized linear loss(GLL)structure,we propose a noisy Frank-Wolfe with shuffle model algorithm(NoisyFWS)and a noisy variance-reduced Frank-Wolfe with the shuffle model algorithm(NoisyVRFWS)by adding calibrated laplace noise under shuffling scheme in thel_(p)(p∈[1,2])-case,and study their privacy as well as utility guarantees for the H?lder smoothness GLL.In particular,the privacy guarantees are mainly achieved by using advanced composition and privacy amplification by shuffling.The utility bounds of the Noisy FWS and NoisyVRFWS are analyzed and obtained the optimal excess population risksO(n-(1+α/4α+log(d)√log(1/δ)/n∈and O(n-1+α/4α+log(d)√log1(+δ)/n^(2)∈with gradient complexity O(n(1+α)^(2)/4α^(2)forα∈[1/√3,1].It turns out that the risk rates under shuffling scheme are a nearly-dimension independent rate,which is consistent with the previous work in some cases.In addition,there is a vital tradeoff between(α,L)-Holder smoothness GLL and the gradient complexity.The linear gradient complexity O(n)is showed by the parameterα=1.
文摘This paper modifies the Frank-Wolfe's algorithm. Under weaker conditions it proves that the modified algorithm is convergent, and specially under the assumption of convexity of the objective function that without assuming {x ̄k} is bounded.