摘要
针对数字信息产生的海量、多角度的非结构化大数据,由于外界干扰、数据结构损坏等因素造成其信息丢失问题,提出了基于迁移学习的非结构化大数据缺失值插补算法。通过迁移学习算法,预测非结构化大数据缺失部位,利用朴素贝叶斯算法分类数据特征,度量属性间权重值,明确数据类别特征差异向量,辨别特征差异程度。采用核回归模型对数据缺失部分实施非线性映射,经过多项式变化编码,描述数据的跨空间互补条件,完成非结构化大数据缺失值插补。实验结果表明,所提算法可以有效完成非结构化大数据缺失值插补,具有较好的插补效果,能提高插补精度。
Due to the complexity of digital information,massive and multi-angle unstructured big data,and external interference,data structure damage and other factors cause its information loss,a missing value interpolation algorithm for unstructured big data based on transfer learning is proposed.Through the migration learning algorithm,the missing parts of unstructured big data are predicted,and the naive Bayesian algorithm is used to classify data features,to measure the weight value between attributes,to clarify the feature difference vector of data categories,and to identify the degree of feature difference.The kernel regression model is used to implement nonlinear mapping for the missing part of the data,and the polynomial change coding is used to describe the cross-space complementary condition of the data,completing the interpolation of the missing value of unstructured big data.The experimental results show that the proposed algorithm can effectively complete the interpolation of missing values of unstructured large data,has good interpolation effect and can improve the interpolation accuracy.
作者
颜远海
杨莉云
YAN Yuanhai;YANG Liyun(College of Data Science,Guangzhou Huashang College,Zengcheng 511300,China)
出处
《吉林大学学报(信息科学版)》
CAS
2024年第2期372-377,共6页
Journal of Jilin University(Information Science Edition)
基金
创新强校工程基金资助项目(2017KQNCX266)。
关键词
迁移学习
非结构化大数据
缺失值插补
缺失值预测
核回归函数
transfer learning
unstructured big data
imputation of missing values
missing value prediction
kernel regression function