摘要
随着互联网的快速发展,网络上丰富的网页数据为各种应用研究提供了海量信息,网页分类是信息组织管理和信息检索的重要技术,随着最近几年的不断研究,网页分类技术又有了新发展,文章首先对网页预处理、特征选择与提取、网页表示模型、分类算法和评价指标等基础技术的研究动态进行了归纳和总结:其次,对近年来网页分类方法的新进展综述分析;最后,文章对研究中面临的主要挑战和发展趋势进行了讨论和展望。
With the rapid development of" [nternet, the rich web data has become the major source of data for lots of research. Web page classification is the key techniques in information management and information retrieval field. In recent years, there have been extensive studies and rapid progresses in web page classification. Firstly, this paper has summarized the new develop- ment of web page classification technology, has analyzed the essential technologies, such as web page preprocessing,feature selec- tion and extracti,m, representation model, classification algorithnl and assessment indicators. Then, this paper presents the new de- velopment of web page classification. Finally, this paper discusses the existing problems and the future directions in web page classification field.
作者
薛永大
XUE Yong-da (Civil Aviation University of China, Tianjin 300300, China)
出处
《电脑知识与技术》
2012年第9期5958-5961,共4页
Computer Knowledge and Technology
关键词
网页分类
网页模型:特征提取
分类算法
评估指标
web page classification
representation model
feature selection
classification algorithm
assessment indicators