期刊文献+

基于多视图典型相关分析的垃圾网页检测 被引量:3

Multi-view canonical correlation analysis based Web spam detection
下载PDF
导出
摘要 首先将垃圾网页特征分为两个不同的视图,即基于内容特征的视图和基于链接特征的视图,利用典型相关分析及其相关改进方法进行特征提取,生成两组新的特征;再对新生成的两视图特征采用不同组合方式产生单视图数据,并用这组数据作为训练数据构建分类算法。实验结果表明,将垃圾网页看成两视图数据,并应用多视图典型相关分析技术,可有效提高垃圾网页的识别精度。 Firstly this paper divided the features of Web spam pages into the content feature based view and the link feature based view. And it employed canonical correlation analysis and promotion methods for feature extraction to generate two new feature sets for each Web page. Then it implemented different combinations of the two new feature sets of Web pages to pro- duce a single view for Web pages, which used to construct classification algorithms. Experimental resuhs show that considering Web page data as two view data and applying multi-view canonical correlation analysis techniques can effectively improve the recognition accuracy of Web spare.
出处 《计算机应用研究》 CSCD 北大核心 2013年第3期810-813,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61170145) 国家教育部高等学校博士点专项基金资助项目(20113704110001) 山东省自然科学基金和科技攻关计划资助项目(ZR2010FM021 2008B0026 2010G0020115)
关键词 垃圾网页检测 典型相关分析 多视图分类 特征抽取 Web spam detection canonical correlation analysis(CCA) multi-view classification feature extraction
  • 相关文献

同被引文献25

  • 1赵丽红,孙宇舸,蔡玉,徐心和.基于核主成分分析的人脸识别[J].东北大学学报(自然科学版),2006,27(8):847-850. 被引量:16
  • 2Ho T K. The random subspace method for constructing decision forests[J].Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1998, 20(8) : 832 -844.
  • 3Hagan M T, Demuth H B, Beale M H. Neural network design [M]. Boston: Pws, 1996.
  • 4Breiman L. Bagging predictors [J]. Machine learning, 1996, 24(2) : 123 -140.
  • 5Shalev Shwartz S, Singer Y. On the equivalence of weak learn ability and linear reparability: New relaxations and efficient boosting algorithms [J]. Machine learning, 2010, 80(2-3) :141 -163.
  • 6Skurichina M, Duin R P W. Bagging, boosting and the random subspace method for linear classifiers [ J]. Pattern Analysis & Applications, 2002, 5(2) : 121 -135.
  • 7Kira K, Rendell L A. A practical approach to feature selection [ C ]// Proceedings of the ninth international workshop on Machine learning. Morgan Kaufmann Publishers Inc. , 1992:249 -256.
  • 8Robnik- ~ikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and ReliefF [ J ]. Machine learning, 2003, 53 (1 -2) : 23 - 69.
  • 9Becchetti L, Castillo C, Donato D, et al. Web spam detection : Link - based and content - based techniques [ C ]//Th6 European Integrated Project Dynamically Evolving, Large Scale Information Systems (DELIS) : proceedings of the final workshop. 2008 , 222:99 -113.
  • 10Castillo C, Donato D, Gionis A, et al. Know your neighbors: Web spam detection using the web topology[ C ]//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007:423 -430.

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部