期刊文献+

基于深度学习的网页分类算法研究 被引量:3

Webpage Classification Based on Deep Learning Algorithm
下载PDF
导出
摘要 网页分类可将信息准确筛选与呈现给用户,提高信息检索的准确率。深度学习是机器学习中一个全新的领域,其本质是一种多层的神经网络学习算法,通过逐层初始化的方法来达到极高的准确率,被多次使用在图像识别、语音识别、文本分类中。提出了基于深度学习的网页分类算法,实验数据证明该方法可有效提高网页分类的准确率。 Ahstract:Webpage classification can be used to select accurate webpage for users, which improves the accuracy of information retrieval. Deep learning is a new field in machine learning world. It's a multi-layer neural network learning algorithm, which achieves a very high accuracy by initializing the layer by layer. It has been used in image recognition, speech recognition and text classification. This paper uses the deep learning algorithm in webpage classification. With the experiments, it finds out that the deep learning has obvious advantages for webpage classification. Key words: Webpage Classification; Deep Learning; Stacked Auto Encoder; TFIDF
作者 陈芊希 范磊
出处 《微型电脑应用》 2016年第2期25-28,共4页 Microcomputer Applications
基金 上海市基础研究重大重点项目(NO.13JC1403500)
关键词 网页分类 深度学习 栈式自动编码器 TFIDF Webpage Classification Deep Learning Stacked Auto Encoder TFIDF
  • 相关文献

参考文献9

  • 1Qi x, Davison B D. Web page classification: Features and algorithms[J]. ACM Computing Surveys (CSUR), 2009, 41(2): 12.
  • 2de Boer V, van Someren M, Lupascu T. Classifying Web Pages with Visual Features[C],WEBIST (l). 2010: 245-252.
  • 3Kan M Y, Thi H O N. Fast webpage classification using URL features[C].Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005: 325-326.
  • 4Ong W K, Hong J L, Fauzi F, et al. Ontological based webpage classification[C].Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on. IEEE, 2012: 224-228.
  • 5Asirvatham A P, Ravi K K, Prakash A. Web page classification based on document structure[C].lEEE National Convention. 2001.
  • 6Qi x, Davison B D. Knowing a web page by the company it keeps[C].Proceedings of the 15th ACM international conference on Information and knowledge managemem. ACM, 2006: 228-237.
  • 7王美方,刘培玉,朱振方.基于TFIDF的特征选择方法[J].计算机工程与设计,2007,28(23):5795-5796. 被引量:23
  • 8孙建涛,沈抖,陆玉昌,石纯一.网页分类技术[J].清华大学学报(自然科学版),2004,44(1):65-68. 被引量:18
  • 9单松巍,冯是聪,李晓明.几种典型特征选取方法在中文网页分类上的效果比较[J].计算机工程与应用,2003,39(22):146-148. 被引量:76

二级参考文献30

  • 1王秀娟,郭军,郑康锋.文本分类中一种新的特征选择方法[J].计算机应用,2005,25(3):661-663. 被引量:15
  • 2柴玉梅,王宇.基于TFIDF的文本特征选择方法[J].微计算机信息,2006,22(08X):24-26. 被引量:32
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 4张海龙,王莲芝.自动文本分类特征选择方法研究[J].计算机工程与设计,2006,27(20):3840-3841. 被引量:45
  • 5冯是聪 单松巍 张志刚 等.一个中文网页数据集及其分类体系[A]..海峡两岸技术交流会[C].南京,2002-10.121-129.
  • 6Yiming Yang,Jan O Pedersen.A comparative Study on Feature Selection in Text Categorization[C].In :Proceedings of the Fourteenth International Conference on Machine Leaming(ICML'97), 1997.
  • 7Yiming Yang,Xin Liu.A re-examination of text categorization methods[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR'99,1999:42---49.
  • 8Yiming Yang.A study on thresholding strategies for text categorization[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'01),2001.
  • 9Salton G, McGill J. Introduction to Modern Information Retrieval 1 edition [M]. Auckland: McGraw Hill, 1983.
  • 10Slattery S. Hypertext Classification [D]. Pittsburgh: Carnegie Mellon Univ, 2001.

共引文献109

同被引文献22

引证文献3

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部