摘要
从文本特征对文本分类结果的整体影响的角度出发,提出一种基于粒子群优化的文本特征选择方法(PSOTFS),使用粒子群算法来挖掘文本特征选择规则。PSOTFS首先使用开方检验对文本特征进行预选择,然后使用粒子群算法对预选择得到的文本特征进行精选。PSOTFS以一个粒子表示一条特征选择规则,特征选择规则集对应某个粒子群,采用分类准确率作为适应度函数,采用分组的方式对粒子的维度进行降维。实验结果表明,PSOTFS比开方检验、信息增益、文档频率和互信息方法能得到更好的分类效果。
From the perspective of the overall impact of text features on the result of text categorization, a text feature selection method based on particle swarm optimization (PSOTFS)is proposed; to mine the text feature selection rules by PSO algorithm. At first, PSOTFS uses CHI to preselect the text features, then uses PSO algorithm to precisely select the text features from the preselected text features. PSOTFS uses a particle to represent a feature selection rule and the set of feature selection rules corresponds with a particle swarm. At the same time, the classification precision is used as the fitness function and grouping is used to reduce the dimensions of the particles. The experiment result shows that the text cat- egorization effectiveness of PSOTFS is better than that of CHI, information gain, document frequency and mutual information.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第7期76-81,共6页
New Technology of Library and Information Service
关键词
文本分类
特征选择
文本特征
粒子群优化
开方检验
Text categorization Feature selection Text feature Particle swarm optimization CHI