期刊文献+

基于CFS-GA特征选择算法的中文网页自动分类 被引量:2

Chinese Web page classification based on CFS-GA feature selection algorithm
下载PDF
导出
摘要 为在中文网页分类时降低特征向量的维度、提高分类的精度,采用一种基于关联的特征选择(Correlation-based Feature Selection,CFS)与遗传算法(Genetic Algorithm,GA)相结合的方法进行特征选择.在该算法中,特征子集被当作GA中的一个染色体进行二进制编码;利用CFS启发值作为GA的适应度函数对个体进行评价;CFS值越大的个体遗传到下一代的概率越大.结合GA的全局搜索特性,该算法可保证所得特征子集是全局最优的.利用weka平台,对搜狗实验室提供的中文网页数据集进行实验.结果表明,该算法能有效降低特征空间的维度、提高分类精度。 To reduce the dimension of the feature space and improve the precision of Chinese Web page classification,a method based on Correlation-based Feature Selection(CFS) and Genetic Algorithm(GA) is used in the process of feature selection.In the CFS-GA algorithm,a feature subset is regarded as a chromosome which is then performed in binary encode,and CFS is used as GA's fitness function to evaluate the chromosome.The greater the CFS value is,the greater the probability that individuals inherit to the next generation will be.Combining with GA's global search character,the algorithm can ensure that the feature subset is global optimum.Experiment is done on weka platform with the Chinese Web page dataset provided by the Sougou lab.The result shows that this algorithm can reduce the dimension of the feature space effectively and improve the precision of the classification.
出处 《上海海事大学学报》 北大核心 2012年第1期77-81,共5页 Journal of Shanghai Maritime University
基金 国家自然科学基金(61175044)
关键词 中文网页分类 特征选择 基于关联的特征选择算法 遗传算法 Chinese Web page classification feature selection correlation-based feature selection genetic algorithm
  • 相关文献

参考文献16

二级参考文献58

  • 1邹志红,孙靖南,任广平.模糊评价因子的熵权法赋权及其在水质评价中的应用[J].环境科学学报,2005,25(4):552-556. 被引量:401
  • 2胡佳妮,徐蔚然,郭军,邓伟洪.中文文本分类中的特征选择算法研究[J].光通信研究,2005(3):44-46. 被引量:47
  • 3陈伟统,钱沄涛.基于粗糙集理论的网络入侵检测方法[J].计算机工程,2006,32(16):133-135. 被引量:11
  • 4冯是聪 单松巍 张志刚 等.一个中文网页数据集及其分类体系[A]..海峡两岸技术交流会[C].南京,2002-10.121-129.
  • 5RAMANNA S, PETERS J F, AHN T. Software quality knowledge discovery: a rough set approach[ C ]// Comput Software & Applications Conf (COMPSAC). Proc 26th Annual Int, 2002 : 1140-1145.
  • 6NGUYEN S H, NGUYEN H S. Analysis of STULONG data by rough set exploration system (RSES) [ EB/OL ]. (2003-01-23) [ 2009-08-27 ]. http://citeseerx.ist. psu. edu/viewdoc/summary? doi = 10.1.1.89. 2218.
  • 7Bykova M, Ostermann S, Tjaden B. Detecting Network Intrusions via a Statistical Analysis of Network Packet Characteristics[C]//Proc of the 33rd Southeastern Symp on System Theory, 2001.
  • 8Sun N Q, Li Y. Intrusion Detection Based on Back-Propagation Neural Network and Feature Selection Mechanism[C] //Proc of FGIT'09,2009 : 151-159.
  • 9Yu L, Liu H. Efficient Feature Selection via Analysis of Relevance and Redundancy[J]. Journal of Machine Learning Research, 2004(5) : 1205-1224.
  • 10Liu H, Yu L. Towards Integrating Feature Selection Algorithms for Classification and Clustering[J]. IEEE Trans on Knowledge and Data Engineering, 2005,17 (3) : 1-12.

共引文献173

同被引文献18

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部