摘要
为了简化最大信息系数计算的复杂度,达到计算准确性与计算复杂度的最优平衡,通过基因与疾病相关性实验研究了最大信息系数阈值的合适取值区间及最优取值。结果表明:利用变量间强相关数据和不相关数据出现的频数,及其在不同阈值下的变化趋势,可以估计出阈值的合适取值区间;通过统计阈值取值区间上界集合的最小值,可以估计阈值的最优取值;对于不同变量,阈值的最优取值也不相同,并且随着采样数的增大,阈值的最优取值有减小的趋势。
In order to simplify the computational complexity of the maximum information coefficient and achieve the optimal balance between computational accuracy and computational complexity,the correlation experiment between genes and diseases is used to investigate the appropriate value interval and optimal value of the threshold of the maximum information coefficient.The results show that the appropriate value interval of the threshold can be estimated by using the frequency of strongly correlated data and uncorrelated data between variables and the variation trend under different thresholds.By calculating the minimum value of the upper bound set of threshold values,the optimal threshold value can be estimated;for different variables,the optimal value of the threshold is not the same,and with the increase of the number of samples,the optimal value of the threshold tends to decrease.
作者
谭藻文
TAN Zaowen(Academy of National Space Planning,Hualan Design(Group)Co.,Ltd.,Nanning 530011,China)
出处
《现代信息科技》
2023年第24期77-81,共5页
Modern Information Technology
关键词
最大信息系数
互信息
相关性
阈值
最小最大策略
maximum information coefficient
mutual information
correlation
threshold
Min-Max strategy