摘要
传统的ReliefF算法使用二值法不能体现离散特征差异大小,且不能去除冗余特征。针对这种情况提出了mRMR-ReliefF特征选择算法。该算法利用概率弥补特征差异度量上的不足,提出新的差异函数。此函数使提取出的特征更能体现文本的类内相关性和类间差异性。该算法还结合了词间相关性。词间相关性在考虑选择和类别相关性大的特征词的同时还考虑了特征冗余的消除。通过三种算法的对比实验,表明该算法为文本分类提供了更有效的特征子集。
Traditional ReliefF algorithm,by using the binary method,can neither reflect the differences of discrete characteristics nor remove the redundant features.In view of this situation,mRMR-ReliefF feature selection algorithm is proposed.The algorithm makes up for the deficiency of feature difference measurement by utilising the probability,and puts forward a new difference function.This function makes the extracted features better reflect both the relevancy within the class and difference among classes of the texts.The algorithm also combines the words relevancy,which not only considers the selection of characteristic words that has much to do with the class but also considers redundancy eliminating.According to the comparison of three algorithms,it shows that the algorithm our paper proposing can provide a more effective feature subset for the text classification.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第9期33-36,共4页
Computer Applications and Software
基金
国家自然科学基金项目(60603047)
教育部留学回国人员科研启动基金资助项目
辽宁省科技计划项目(2008216014)
辽宁省教育厅高等学校科研基金项目(L2010229)
大连市优秀青年科技人才基金项目(2008J23JH026)