摘要
针对数据仿真过程中表格数据属性间关联难的问题,提出一种刻画表格数据中非时间属性间关联特征的H模型。首先,从数据集中提取评价主体和被评价主体关键属性,进行两重频数统计,得到关于关键属性的4个关系对;然后,计算各关系对的最大信息系数(MIC)来评估各关系对的相关性,并采用拉伸指数分布(SE)对各关系对进行关系拟合;最后,设置评价主体和被评价主体的数据规模,根据拟合出的关系计算出评价主体的活跃度和被评价主体的流行度,通过活跃度总和等于流行度总和建立关联,得到非时间属性关联的H模型。实验结果表明,利用H模型能有效地刻画真实数据集中非时间属性间的关联特征。
To solve the difficulty of attribute correlation in the process of simulating table data, an H model was proposed for describing not-temporal attribute correlation in table data. Firstly, the key attributes of the evaluation subject and the evaluated subject were extracted from the data set, by the twofold frequency statistics, four relationships of the key attributes were obtained. Then, the Maximum Information Coefficient (MIC) of each relationship was calculated to evaluate the correlation of each relationship, and each relationship was fitted by the Stretched Exponential (SE) distribution. Finally, the data scales of the evaluation subject and the evaluated subject were set. According to the result of fitting, the activity of the evaluation subject was calculated, and the popularity of the evaluated subject was calculated. H model was obtained through the association that was established by equal sum of activity and popularity. The experimental results show that H model can effectively describe the correlation characteristics of the non-temporal attributes in real data sets.
出处
《计算机应用》
CSCD
北大核心
2017年第9期2684-2688,共5页
journal of Computer Applications
基金
福建省科技计划重大项目(2016H6007)
福州市市校合作项目(2016-G-40)~~
关键词
数据仿真
关联
最大信息系数
拉伸指数分布
属性关联
data simulation correlation Maximum Information Coefficient (MIC) Stretched Exponential (SE) distribution attribute correlation