提出了基于样本相关性的层次特征选择算法(hierarchical feature selection algorithm based on instance correlations,HFSIC)以进一步提高分层分类特征选择算法的性能。在使用稀疏正则项去除不相关特征之后,将层次结构中的父子关系与...提出了基于样本相关性的层次特征选择算法(hierarchical feature selection algorithm based on instance correlations,HFSIC)以进一步提高分层分类特征选择算法的性能。在使用稀疏正则项去除不相关特征之后,将层次结构中的父子关系与特征空间中样本之间的重构关系相结合,学习同一子树下各类别的样本相关性,利用递归正则优化输出特征权重矩阵。在衡量样本相关性时,将重构系数矩阵整合到训练模型中,同时利用l2,1范数去除不相关的和冗余的特征。使用加速近端梯度法解决所提模型的优化问题,并在多个评价指标下评估所提算法的优越性。实验结果表明,所提方法在5个数据集上的表现优于其他算法,验证了该算法的有效性。展开更多
Incomplete data samples have a serious impact on the effectiveness of data mining.Aiming at the LRE historical test samples,based on correlation analysis of condition parameter,this paper introduced principle componen...Incomplete data samples have a serious impact on the effectiveness of data mining.Aiming at the LRE historical test samples,based on correlation analysis of condition parameter,this paper introduced principle component analysis(PCA)and proposed a complete analysis method based on PCA for incomplete samples.At first,the covariance matrix of complete data set was calculated;Then,according to corresponding eigenvalues which were in descending,a principle matrix composed of eigen-vectors of covariance matrix was made;Finally,the vacant data was estimated based on the principle matrix and the known data.Compared with traditional method validated the method proposed in this paper has a better effect on complete test samples.An application example shows that the method suggested in this paper can update the value in use of historical test data.展开更多
文摘提出了基于样本相关性的层次特征选择算法(hierarchical feature selection algorithm based on instance correlations,HFSIC)以进一步提高分层分类特征选择算法的性能。在使用稀疏正则项去除不相关特征之后,将层次结构中的父子关系与特征空间中样本之间的重构关系相结合,学习同一子树下各类别的样本相关性,利用递归正则优化输出特征权重矩阵。在衡量样本相关性时,将重构系数矩阵整合到训练模型中,同时利用l2,1范数去除不相关的和冗余的特征。使用加速近端梯度法解决所提模型的优化问题,并在多个评价指标下评估所提算法的优越性。实验结果表明,所提方法在5个数据集上的表现优于其他算法,验证了该算法的有效性。
基金supported by National Natural Science Foundation of China(No.51075391)
文摘Incomplete data samples have a serious impact on the effectiveness of data mining.Aiming at the LRE historical test samples,based on correlation analysis of condition parameter,this paper introduced principle component analysis(PCA)and proposed a complete analysis method based on PCA for incomplete samples.At first,the covariance matrix of complete data set was calculated;Then,according to corresponding eigenvalues which were in descending,a principle matrix composed of eigen-vectors of covariance matrix was made;Finally,the vacant data was estimated based on the principle matrix and the known data.Compared with traditional method validated the method proposed in this paper has a better effect on complete test samples.An application example shows that the method suggested in this paper can update the value in use of historical test data.