摘要
随着信息技术的发展,互联网数据急剧增长.为了有效地组织和管理这些海量网页信息,通常按照一个大规模的概念或主题类别层次对网络上的信息进行分类,以更好地搜索和访问这些网络资源.在这个过程中,大规模层次分类问题研究如何将互联网上的网页文档准确地分到类别层次中的各个类别.该文对大规模层次分类问题进行了分析.首先,给出了大规模层次分类问题的定义,分析了大规模层次分类问题的求解策略;其次,对大规模层次分类问题的求解方法加以分类,在分类基础上,介绍了各种典型的求解方法并进行了对比;最后总结了各种大规模层次分类问题求解方法并指出了未来的研究方向.
With the development of information technology, Web information management and access become much difficult to some extent as rapid increase in Internet data. A large scale class hierarchy of concepts or topics was used to label the web information to make information access easier. In this process, large scale hierarchical classification problem researches how to classify the Web documents into the categories among the class hierarchy, which is surveyed in this pa- per. Firstly, a definition of large scale hierarchical classification problem is proposed, which is used to describe the problem in abstraction level. Meanwhile, strategies for conquering the prob- lem are also investigated. Secondly, classification of solving methods for this problem is ana- lyzed, and on the basis of the classification, many typical solving methods are introduced and compared. Lastly, future research trends of the solving methods for this problem are reviewed.
出处
《计算机学报》
EI
CSCD
北大核心
2012年第10期2101-2115,共15页
Chinese Journal of Computers
基金
国家"八六三"高技术研究发展计划项目基金(2010AA012505
2011AA010702
2012AA01A401
2012AA01A402)
国家自然科学基金(60933005)
国家科技支撑计划(2012BAH38B04)
国家242信息安全计划(2011A010)资助~~
关键词
文本分类
大规模层次分类
类别层次
类别层次树
text categorization
large scale hierarchical classification
class hierarchy
tree-struc tured class hierarchy