摘要
研究网页自动分类是为快速找到用户所需网页。由于网络中网页数量相当大,而且网络是一种半结构化、海量、高维等文本,传统文本分类方法无法进行降维和消除冗余信息,易出现维数灾问题,网页分类准确率低,用户很难找到自己所需网页。为了提高网页分类准确率,提出基于主成分支持向量机的网页自动分类方法。首先对网页数据进行预处理,提取网页特征向量向量,消除冗余信息,然后采用主成分分析对网页特征向量进行降维处理,然后采用支持向量机对网页进行自动分类。对网页数据集进行仿真,结果表明,网页分类准确率达95%以上,网页分类速度较加,说明主成分支持向量机是一种有效的网页分类方法。
Research data mining technology and improve the web classification accuracy.Web data has the characteristics of semi-structured,vast and high-dimension,and the traditional classification methods cannot reduce the dimension andemliminatethe redundant messege,easily causing dimension disaster problem and low web classification accuracy.In order to improve the web classification accuracy,a web automatic classification method was proposed based on principal component analysis of support vector machine.Firstly,the web data was pretreatmented and the feature vector sets were extracted.Then,the web features were reduced by principal component analysis,and the webs were classified by the support vector machine.The simulation experiments were carried out on web dataset,and the web classification accuracy is over 95%,meanwhile,the classification speed is increased.The results show that the proposed method is an effective web classification method.
出处
《计算机仿真》
CSCD
北大核心
2011年第10期121-124,252,共5页
Computer Simulation
基金
湖南省科技厅计划项目(2010FJ3024)
湖南工业大学教学改革研究项目(09A02)