We study a class of deep neural networks with architectures that form a directed acyclic graph(DAG).For backpropagation defined by gradient descent with adaptive momentum,we show weights converge for a large class of ...We study a class of deep neural networks with architectures that form a directed acyclic graph(DAG).For backpropagation defined by gradient descent with adaptive momentum,we show weights converge for a large class of nonlinear activation functions.'The proof generalizes the results of Wu et al.(2008)who showed convergence for a feed-forward network with one hidden layer.For an example of the effectiveness of DAG architectures,we describe an example of compression through an AutoEncoder,and compare against sequential feed-forward networks under several metrics.展开更多
文摘We study a class of deep neural networks with architectures that form a directed acyclic graph(DAG).For backpropagation defined by gradient descent with adaptive momentum,we show weights converge for a large class of nonlinear activation functions.'The proof generalizes the results of Wu et al.(2008)who showed convergence for a feed-forward network with one hidden layer.For an example of the effectiveness of DAG architectures,we describe an example of compression through an AutoEncoder,and compare against sequential feed-forward networks under several metrics.