期刊文献+

基于非时间属性关联的数据逼真生成算法

Table Data Simulation Generating Algorithm Based on Not-Temporal Attribute
下载PDF
导出
摘要 提出基于非时间属性关联的数据逼真生成算法.该算法可以解决数据生成器研发中非时间属性关联构建的困难问题,在大数据测评领域中对仿真数据生成有重要应用价值.首先,从数据集中提取关键的两个非时间属性,对它们分别做两重频数统计.然后,根据两次统计结果计算最大信息系数值来评估相关性,用拉伸指数分布进行拟合,构建出关联模型.最后,通过模型参数构建约束,在此约束的二维矩阵中生成数据.实验结果表明,该算法能够有效地模拟真实数据集的数据特征. A table data simulation generating algorithm is proposed based on not-temporal attribute correlation. This algorithm can overcome the difficulty in building not-temporal attribute correlation in the development of big data simulation generator, and play an important role in the field of measurement of the big data simulation generated. Firstly,we extract the two key not-temporal attributes from the data set, and make the statistics of twofold frequency. Then, based on the statistical results, we calculate the maximal information coefficient(MIC) value to measure dependence for twovariable relationships. We use the stretched exponential(SE) distribution to fit the relationship, and build the correlation model. Finally, we generate data in a two-dimensional matrix with this model. The experimental results show that this algorithm can effectively describe the data characteristics of the real data set.
出处 《计算机系统应用》 2018年第2期30-36,共7页 Computer Systems & Applications
基金 福建省科技计划重大项目(2016H6007) 福州市市校合作项目(2016-G-40)
关键词 数据逼真生成 关联 最大信息系数 拉伸指数分布 属性关联 data simulation generator correlation maximal information coefficient (MIC) stretched exponential distribution attribute correlation
  • 相关文献

参考文献3

二级参考文献100

  • 1孙禄杰,柏满迎.相关系数与连接函数[J].统计与决策,2006,22(16):4-6. 被引量:10
  • 2李国杰.大数据研究的科学价值[J].中国计算机学会通讯,2012,8(9):8-15.
  • 3韩筱璞,周涛,汪秉宏.基于自适应调节的人类动力学模型[J].复杂系统与复杂性科学,2007,4(4):1-5. 被引量:21
  • 4Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition and productivity[R]. USA, Mckinsey Global Institute, 2011.
  • 5Speed T. A correlation for the 21st century[J]. Science, 2011, 334:1502-1503.
  • 6Fan J, Han F, Liu H. Challenges of big data analysis [J]. National Science Review, 2013, 1 .. 293-314.
  • 7Davis J M, Searles Quick V B, Sikela J M. Replicated linear association between DUF1220 copy number and severity of so- cial impairment in autism[J]. Hum Genet, 2015, 134:569-575.
  • 8Duran B S, Odell P L. Cluster analysis: A survey [M]. Berlin Heidelgerg: Springer-Verlag, 2013.
  • 9Mi Huaiyu, Anushya M, John T C, et al. Large-scale gene function analysis with the panther classification system[J]. Na- ture Protocols, 2013, 8(8): 1551-1566.
  • 10Puth M T, Neuhauser M, Ruxton G D. Effective use of pearson's producte moment correlation coefficient[J]. Animal Be- haviour, 2014, 93:183-189.

共引文献75

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部