We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equati...We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyperparameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.展开更多
In this paper,we investigate the relationship between deep neural net works(DNN)with rectified linear unit(ReLU)function as the activation function and continuous piecewise linear(CPWL)functions,especially CPWL functi...In this paper,we investigate the relationship between deep neural net works(DNN)with rectified linear unit(ReLU)function as the activation function and continuous piecewise linear(CPWL)functions,especially CPWL functions from the simplicial linear finite element method(FEM).We first consider the special case of FEM.By exploring the DNN representation of its nodal basis functions,we present a ReLU DNN representation of CPWL in FEM.We theoretically establish that at least 2 hidden layers are needed in a ReLU DNN to represent any linear finite element functions inΩ■R^2 when d≥2.Consequently,for d=2,3 which are often encountered in scientific and engineering computing,the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN.Then we include a detailed account on how a general CPWL in R^d can be represented by a ReLU DNN with at most[log2(d+1)]|hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a represe ntation.Furthermore,using the relationship bet ween DNN and FEM,we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications.Finally,as a proof of concept,we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations.展开更多
基金supported by the Elite Program of Computational and Applied Mathematics for PhD Candidates of Peking Universitysupported in part by the National Science Foundation of USA (Grant No. DMS-1819157)+2 种基金the US Department of Energy Office of ScienceOffice of Advanced Scientific Computing ResearchApplied Mathematics Program (Grant No. DE-SC0014400)
文摘We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyperparameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.
基金This work is partially supported by Beijing International Center for Mat hematical Research,the Elite Program of Computational and Applied Mathematics for PhD Candidates of Peking University,NSFC Grant 91430215,NSF Grants DMS-1522615,DMS-1819157.
文摘In this paper,we investigate the relationship between deep neural net works(DNN)with rectified linear unit(ReLU)function as the activation function and continuous piecewise linear(CPWL)functions,especially CPWL functions from the simplicial linear finite element method(FEM).We first consider the special case of FEM.By exploring the DNN representation of its nodal basis functions,we present a ReLU DNN representation of CPWL in FEM.We theoretically establish that at least 2 hidden layers are needed in a ReLU DNN to represent any linear finite element functions inΩ■R^2 when d≥2.Consequently,for d=2,3 which are often encountered in scientific and engineering computing,the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN.Then we include a detailed account on how a general CPWL in R^d can be represented by a ReLU DNN with at most[log2(d+1)]|hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a represe ntation.Furthermore,using the relationship bet ween DNN and FEM,we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications.Finally,as a proof of concept,we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations.