It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
随着电力现货市场的开展,短期电价预测对于各市场主体的决策有着重要意义,而高比例清洁能源与储能的不断接入给短期电价预测带来很大挑战。提出一种基于最大信息系数法(maximum information coefficient,MIC)、集成经验模态分解(ensembl...随着电力现货市场的开展,短期电价预测对于各市场主体的决策有着重要意义,而高比例清洁能源与储能的不断接入给短期电价预测带来很大挑战。提出一种基于最大信息系数法(maximum information coefficient,MIC)、集成经验模态分解(ensemble empirical mode decomposition,EEMD)和改进Informer的短期电价多步预测模型。首先,采用MIC分析出与电价相关性较高的几类因素作为模型原始输入序列;然后,将上述原始序列进行EEMD分解后得到多条本征模函数(intrinsic mode function,IMF)和一个残余项后输入改进Informer分别得到翌日24点多步预测结果,再对预测结果进行滤波;最后,将滤波后序列的预测结果叠加得到最终的预测值。以西班牙电力市场数据进行验证,实验结果证明该模型可以有效提高电力市场短期电价多步预测精度。展开更多
针对目前火电厂发电量难以预测问题,提出一种基于最大互信息系数(maximal information coefficient,MIC)和极限梯度提升(extreme gradient boosting,XGBoost)的火电厂发电量预测方法。首先对原始数据进行数据预处理工作,然后利用MIC计...针对目前火电厂发电量难以预测问题,提出一种基于最大互信息系数(maximal information coefficient,MIC)和极限梯度提升(extreme gradient boosting,XGBoost)的火电厂发电量预测方法。首先对原始数据进行数据预处理工作,然后利用MIC计算各特征与目标变量的相关性,通过特征重要性排序选择特征变量作为输入,最后利用XGBoost算法建立火电厂发电量预测模型。结果表明:该模型可以有效解决非线性变量难以筛选的问题,减少输入特征的维度,预测结果的均方根误差和平均绝对百分比误差较小,模型具有较高的预测精度,对火电厂能够提供一定参考意义。展开更多
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘针对目前火电厂发电量难以预测问题,提出一种基于最大互信息系数(maximal information coefficient,MIC)和极限梯度提升(extreme gradient boosting,XGBoost)的火电厂发电量预测方法。首先对原始数据进行数据预处理工作,然后利用MIC计算各特征与目标变量的相关性,通过特征重要性排序选择特征变量作为输入,最后利用XGBoost算法建立火电厂发电量预测模型。结果表明:该模型可以有效解决非线性变量难以筛选的问题,减少输入特征的维度,预测结果的均方根误差和平均绝对百分比误差较小,模型具有较高的预测精度,对火电厂能够提供一定参考意义。