期刊文献+

基于Rough集潜在语义索引的Web文档分类 被引量:7

Web Document Classification Based on Rough Set Latent Semantic Indexing
下载PDF
导出
摘要 Rough集(粗糙集)理论是一种处理不确定或模糊知识的数学工具。提出了一种基于Rough集理论的潜在语义索引的Web文档分类 方法。首先应用向量空间模型表示Web文档信息,然后通过矩阵的奇异值分解来进行信息过滤和潜在语义索引;运用属性约简算法生成分类 规则,最后利用多知识库进行文档分类。通过试验比较,该方法具有较好的分类效果。 Rough set theory is a mathematical tool to deal with uncertain or vague knowledge. An approach to Web document classification based on rough set latent semantic indexing is proposed. Firstly, Web documents, which are denoted by vector space model reduced document feature set. Then, information filtering and latent semantic indexing are conducted by singular value decomposition of matrix. Generating classification rule by attribution reduces algorithm. Finally, the documents are classified with multiple knowledge bases. The experiment results and the comparison with others show tha this Web document classification has good classification performance.
出处 《计算机工程》 CAS CSCD 北大核心 2004年第13期3-5,共3页 Computer Engineering
关键词 粗糙集 潜在语义索引 WEB文档分类 信息过滤 信息检索 s Rough set Latent semantic indexing Web document classification Information filtering Information retrieval
  • 相关文献

参考文献5

  • 1Pawlak Z. Rough Sets. International Journal of Information and Computer Science, 1982, 11(5): 341-356
  • 2Pawlak Z, Grzymla-Busse J. Rough Sets. Communications of the ACM, 1995,38(11):88-95
  • 3Deerwester S, Dumains S, Fumas G, et al. Indexing by Latent Semantic Analysis [J]. Journal of the American Society for Information Science, 1990, 41(6):391-407
  • 4Bao Yongguang, Aoyama S, Du Xiaoyong. A Rough Set-based Hybrid Method to Text Categorization. Second International Conference on Web Information Systems Engineering (WISE′01) Volumel.2002:254-261
  • 5Chouchoulas A, Shen Q. A Rough Set-Based Approach to Text Classification. In 7th International Workshop, RSFDGrC99, Yamaguchi,Japan, 1999:118-129

同被引文献72

引证文献7

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部