摘要
深度学习模型训练需要大量的有标签数据进行训练,现实生活中数据大多没有标签,需要进行人工标注,对于小样本的训练存在过拟合现象,针对此问题,本文提出一种算法:首先采用稀疏编码器对数据进行降维处理,然后利用T-SNE算法继续将数据维度降低到二维空间,最后采用高斯混合模型对数据进行聚类分析。该算法采用无监督斱法,不需要预先对数据进行标签化。该算法对数据过拟合具有一定的泛化能力,在手写数据集的训练集取得0.89205的准确度,在测试集中取得0.896的精度。该算法为小样本的学习提供了新思路。
For deep learning,a large amount of labeled data is required for training.In real life,a large amount of data is unlabeled,and there is an over-fitting phenomenon for training with small samples.This paper proposes an algorithm that first uses a sparse encoder to reduce the dimensionality of the data,then uses the T-SNE algorithm to continue to reduce the data dimension to a two-dimensional space,and finally uses a Gaussian mixture model to cluster the data.The algorithm uses an unsupervised method and does not need to label the data in advance.The algorithm has a certain generalization ability for data over-fitting.In the training set of the handwritten data set,an accuracy of 0.89205 is obtained,and an accuracy of 0.896 is obtained in the test machine.It provides new ideas for the study of small samples.Deep learning requires a large number of labeled samples for training.I n reality,it is difficult to collect a large number of labeled samples,and it also brings difficulties to unsupervised learning.Meta-learning is committed to solving the classification problem of small samples,but too few samples will bring about over fitting problems,and more.This paper proposes a method of data dimensionality reduction based on autoencoder,and then k-means for data clustering and got 0.89205 in the training set,and 0.896 in the test set.
作者
马双宝
高梦圆
胡江宇
贾树林
董玉婕
MA Shuang-bao;GAO Meng-yuan;HU Jiang-yu;JIA Shu-lin;DONG Yu-jie(School of Mechanical Engineering and Automation,Wuhan Textile University,Wuhan Hubei 430200,China)
出处
《武汉纺织大学学报》
2021年第2期3-8,共6页
Journal of Wuhan Textile University