摘要
在文本分类中,特征空间维数通常高达几万,甚至远远超出训练样本的个数,这是一种十分普遍的现象。为了提高文本挖掘算法的运行速度,降低占用的内存空间,提出了一种基于优化的模拟退火算法的特征选择方法。在该方法中,为避免遗失当前最优解,增加了记忆功能,将当前最好的状态记忆下来,从而使得模拟退火算法成为一种智能化算法;设计了一个自适应温度更新函数,并设置双阈值使得在尽量保持最优性的前提下减少计算量,从而较快地获得较具代表性的特征子集。实验结果表明该方法是有效的。
In text categorization,one problem is usually confronted with feature spaces containing 10,000 dimensions and more, even exceeding the number of available training samples.In order to enhance operating speed and reduce memory space oceupied,a feature selection method based on an improved Simulated Annealing Algorithm is presented.In order to avoid missing current optimal solution,the presented method is increased memory function to remember the current best state so that it becomes an intelligent algorithm.An adaptive temperature update function and a dual-threshold are set up to reduce amount of calculation,so can acquire quickly the feature subsets which are more representative.Experimental results show that presented method is effective.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第4期8-11,共4页
Computer Engineering and Applications
基金
四川省科技计划项目No.2008GZ0003
四川省科技厅科技攻关项目No.07GG006-019~~
关键词
文本分类
特征空间
特征选择
模拟退火算法
text categorization
feature space
feature selection
simulated annealing algorithm