摘要
Rough集(粗糙集)理论是一种处理不确定或模糊知识的数学工具。提出了一种基于Rough集理论的潜在语义索引的Web文档分类 方法。首先应用向量空间模型表示Web文档信息,然后通过矩阵的奇异值分解来进行信息过滤和潜在语义索引;运用属性约简算法生成分类 规则,最后利用多知识库进行文档分类。通过试验比较,该方法具有较好的分类效果。
Rough set theory is a mathematical tool to deal with uncertain or vague knowledge. An approach to Web document classification based on rough set latent semantic indexing is proposed. Firstly, Web documents, which are denoted by vector space model reduced document feature set. Then, information filtering and latent semantic indexing are conducted by singular value decomposition of matrix. Generating classification rule by attribution reduces algorithm. Finally, the documents are classified with multiple knowledge bases. The experiment results and the comparison with others show tha this Web document classification has good classification performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2004年第13期3-5,共3页
Computer Engineering
关键词
粗糙集
潜在语义索引
WEB文档分类
信息过滤
信息检索
s Rough set
Latent semantic indexing
Web document classification
Information filtering
Information retrieval