Stochastic gradient descent(SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning.This algorithm and its variants are the preferred algorithm while optimizing paramete...Stochastic gradient descent(SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning.This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed.Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems.However,the deep neural network has its unique properties.Some assumptions are inappropriate in the actual optimization process of this kind of model.In this paper,we modify the assumptions to make them more consistent with the actual optimization process of deep neural network.Based on new assumptions,we studied the convergence and convergence rate of SGD and its two common variant algorithms.In addition,we carried out numerical experiments with LeNet-5,a common network framework,on the data set MNIST to verify the rationality of our assumptions.展开更多
Numerous intriguing optimization problems arise as a result of the advancement of machine learning.The stochastic first-ordermethod is the predominant choicefor those problems due to its high efficiency.However,the ne...Numerous intriguing optimization problems arise as a result of the advancement of machine learning.The stochastic first-ordermethod is the predominant choicefor those problems due to its high efficiency.However,the negative effects of noisy gradient estimates and high nonlinearity of the loss function result in a slow convergence rate.Second-order algorithms have their typical advantages in dealing with highly nonlinear and ill-conditioning problems.This paper provides a review on recent developments in stochastic variants of quasi-Newton methods,which construct the Hessian approximations using only gradient information.We concentrate on BFGS-based methods in stochastic settings and highlight the algorithmic improvements that enable the algorithm to work in various scenarios.Future research on stochastic quasi-Newton methods should focus on enhancing its applicability,lowering the computational and storage costs,and improving the convergence rate.展开更多
In this paper, two PVD-type algorithms are proposed for solving inseparable linear constraint optimization. Instead of computing the residual gradient function, the new algorithm uses the reduced gradients to construc...In this paper, two PVD-type algorithms are proposed for solving inseparable linear constraint optimization. Instead of computing the residual gradient function, the new algorithm uses the reduced gradients to construct the PVD directions in parallel computation, which can greatly reduce the computation amount each iteration and is closer to practical applications for solve large-scale nonlinear programming. Moreover, based on an active set computed by the coordinate rotation at each iteration, a feasible descent direction can be easily obtained by the extended reduced gradient method. The direction is then used as the PVD direction and a new PVD algorithm is proposed for the general linearly constrained optimization. And the global convergence is also proved.展开更多
In this paper the definition of domination is generalized to the case that the elements of the traffic matrices may have negative values. It is proved that D3 dominates D3 + λ(D2 - D1) for any λ ≥0 if D1 dominat...In this paper the definition of domination is generalized to the case that the elements of the traffic matrices may have negative values. It is proved that D3 dominates D3 + λ(D2 - D1) for any λ ≥0 if D1 dominates D2. Let u(D) be the set of all the traffic matrices that are dominated by the traffic matrix D. It is shown that u ( D∞) and u (D ∈) are isomorphic. Besides, similar results are obtained on multi-commodity flow problems. Fhrthermore, the results are the generalized to integral flows.展开更多
In this paper, a domain in a cube is called a coverage hole if it is not covered by the largest component of the random geometric graph in this cube. We obtain asymptotic properties of the size of the largest coverage...In this paper, a domain in a cube is called a coverage hole if it is not covered by the largest component of the random geometric graph in this cube. We obtain asymptotic properties of the size of the largest coverage hole in the cube. In addition, we give an exponentially decaying tail bound for the probability that a line with length s do not intersect with the coverage of the infinite component of continuum percolation. These results have applications in communication networks and especially in wireless ad-hoc sensor networks.展开更多
Optimization is the theoretical and algorithmic basis of artificial intelligence.Professor Michael Jordan,member of National Academy of Sciences,member of National Academy of Engineering,member of the American Academy...Optimization is the theoretical and algorithmic basis of artificial intelligence.Professor Michael Jordan,member of National Academy of Sciences,member of National Academy of Engineering,member of the American Academy of Art and Sciences,pointed out in his one-hour report at the International Congress of Mathematicians that the change of artificial intelligence era comes from optimization algorithms in application scenarios.Machine learning,in turn,offers many new ideas for optimizing models and algorithms.The history of machine learning is the process of fusion of different learning models and optimization algorithms.展开更多
Presents information on a study which proposed a superlinearly convergent algorithm of sequential systems of linear equations or nonlinear optimization problems with inequality constraints. Assumptions; Discussion on ...Presents information on a study which proposed a superlinearly convergent algorithm of sequential systems of linear equations or nonlinear optimization problems with inequality constraints. Assumptions; Discussion on lemmas about several matrices related to the common coefficient matrix F; Strengthening of the regularity assumptions on the functions involved; Numerical experiments.展开更多
基金supported by the National Natural Science Foundation of China(Nos.11731013,U19B2040,11991022)by the Leading Project of the Chinese Academy of Sciences(Nos.XDA27010102,XDA27010302)。
文摘Stochastic gradient descent(SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning.This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed.Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems.However,the deep neural network has its unique properties.Some assumptions are inappropriate in the actual optimization process of this kind of model.In this paper,we modify the assumptions to make them more consistent with the actual optimization process of deep neural network.Based on new assumptions,we studied the convergence and convergence rate of SGD and its two common variant algorithms.In addition,we carried out numerical experiments with LeNet-5,a common network framework,on the data set MNIST to verify the rationality of our assumptions.
基金the National Key R&D Program of China(No.2021YFA1000403)the National Natural Science Foundation of China(Nos.11731013,12101334 and U19B2040)+1 种基金the Natural Science Foundation of Tianjin(No.21JCQNJC00030)the Fundamental Research Funds for the Central Universities。
文摘Numerous intriguing optimization problems arise as a result of the advancement of machine learning.The stochastic first-ordermethod is the predominant choicefor those problems due to its high efficiency.However,the negative effects of noisy gradient estimates and high nonlinearity of the loss function result in a slow convergence rate.Second-order algorithms have their typical advantages in dealing with highly nonlinear and ill-conditioning problems.This paper provides a review on recent developments in stochastic variants of quasi-Newton methods,which construct the Hessian approximations using only gradient information.We concentrate on BFGS-based methods in stochastic settings and highlight the algorithmic improvements that enable the algorithm to work in various scenarios.Future research on stochastic quasi-Newton methods should focus on enhancing its applicability,lowering the computational and storage costs,and improving the convergence rate.
基金Supported by the National Natural Science Foundation of China(No.11101420,11331012,71271204)
文摘In this paper, two PVD-type algorithms are proposed for solving inseparable linear constraint optimization. Instead of computing the residual gradient function, the new algorithm uses the reduced gradients to construct the PVD directions in parallel computation, which can greatly reduce the computation amount each iteration and is closer to practical applications for solve large-scale nonlinear programming. Moreover, based on an active set computed by the coordinate rotation at each iteration, a feasible descent direction can be easily obtained by the extended reduced gradient method. The direction is then used as the PVD direction and a new PVD algorithm is proposed for the general linearly constrained optimization. And the global convergence is also proved.
基金Supported by National Natural Science Foundation of China under Grant No.(1157101511331012)the Open Project of Key Laboratory of Big Data Mining and Knowledge ManagementKnowledge Innovation Program of the Chinese Academy of Sciences under Grant No.(KGCX2-RW-329)
文摘In this paper the definition of domination is generalized to the case that the elements of the traffic matrices may have negative values. It is proved that D3 dominates D3 + λ(D2 - D1) for any λ ≥0 if D1 dominates D2. Let u(D) be the set of all the traffic matrices that are dominated by the traffic matrix D. It is shown that u ( D∞) and u (D ∈) are isomorphic. Besides, similar results are obtained on multi-commodity flow problems. Fhrthermore, the results are the generalized to integral flows.
基金Supported by the National Natural Science Foundation of China(No.71271204)Knowledge Innovation Program of the Chinese Academy of Sciences(No.kjcx-yw-s7)
文摘In this paper, a domain in a cube is called a coverage hole if it is not covered by the largest component of the random geometric graph in this cube. We obtain asymptotic properties of the size of the largest coverage hole in the cube. In addition, we give an exponentially decaying tail bound for the probability that a line with length s do not intersect with the coverage of the infinite component of continuum percolation. These results have applications in communication networks and especially in wireless ad-hoc sensor networks.
文摘Optimization is the theoretical and algorithmic basis of artificial intelligence.Professor Michael Jordan,member of National Academy of Sciences,member of National Academy of Engineering,member of the American Academy of Art and Sciences,pointed out in his one-hour report at the International Congress of Mathematicians that the change of artificial intelligence era comes from optimization algorithms in application scenarios.Machine learning,in turn,offers many new ideas for optimizing models and algorithms.The history of machine learning is the process of fusion of different learning models and optimization algorithms.
基金This research was supported by the National Natural Science Foundation of China(19571001, 19971002, 79970014) Cross-century Excellent Personnel Award and Teaching and Research Award Program for Outstanding Young Teachers in High Education Ministry o
文摘Presents information on a study which proposed a superlinearly convergent algorithm of sequential systems of linear equations or nonlinear optimization problems with inequality constraints. Assumptions; Discussion on lemmas about several matrices related to the common coefficient matrix F; Strengthening of the regularity assumptions on the functions involved; Numerical experiments.