摘要
Web文档自动分类是Web挖掘中的重要研究内容。文档向量空间模型 (VSM)是实现文档自动分类的基础 ,但如何排除冗余属性并降低向量空间的维数是一个难点。文中运用粗集理论对由样本文档集合构成的信息系统进行数据泛化 ,并求取文档的最优规约属性集 ,大大降低了文档的特征空间的维数 ,减少了冗余属性对文档分类的干扰 ,提高了分类效率。运用FuzzyARTMAP(adaptiveresonancetheorymapping)神经网络 ,利用其自适应分类和增量学习的优良特性 。
The web documents classification is an important research content of web mining. Document vector space model is the foundation of automatic classification of documents, while it is difficult to eliminate redundant attributes and reduce the dimension of the vector space. The Rough Sets Theory is applied to generalize the information system comprised by document samples set, and to compute the best reducing properties set. So dimension of document feature space is reduced greatly, and disturbance to document classification is decreased too, which improve the efficiency of classification. In addition, using the advantage of adaptive classification and incremental learning of Fuzzy ARTMAP neural network, the online adaptive classification of web document is achieved.
出处
《重庆大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2003年第7期47-51,共5页
Journal of Chongqing University
关键词
网页分类
粗集
属性规约
在线自适应分类
WEB文档
web pages classification
rough sets
attributes reduction
online adaptive classification