期刊文献+

一种基于潜在语义结构的文本分类模型 被引量:27

A Text Classification Model Based on the Latent Semantic Structure
下载PDF
导出
摘要 潜在语义索引(LSI)模型能在一定程度上解决一词多义和多词一义问题,并能过滤一部分文档噪音.然而在LSI模型中,一些对分类贡献大的特征,由于其对应的特征值小而被滤掉.针对这一问题,文中提出了一种扩展LSI模型的文本分类模型.该模型在尽量保留文档信息的同时,增加考虑了文档的类别信息,从而能比LSI模型更好地表示原始文档空间中的潜在语义结构. In the Latent Semantic Indexing (LSI) model, the problems of polysemy and synonymy can be dealt with to a certain degree and some noise in the raw document can be reduced, while some important features may be ignored because of their small feature values. To solve the problem, a new text classification model extending the LSI model is proposed. In this model, the classification information of the training document is additionally taken into account while keeping as much document information as possible. So the proposed model can better capture the latent semantic structure behind the classification examples than the LSI model.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2004年第z1期99-102,共4页 Journal of South China University of Technology(Natural Science Edition)
关键词 文本分类 潜在语义索引 偏最小二乘法 text classification latent semantic indexing partial least square analysis
  • 相关文献

参考文献7

  • 1[1]Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Survey, 2002,34 (1):1 -47.
  • 2[2]Deerwester S,Dumais S T,Furnas G W,et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990,41 (6) :391 - 407.
  • 3[3]Dumais S T. Using LSI for information filtering [A].Harman D. The Third Text Retrieval Conference ( TREC - 3) [C]. USA: National Institute of Standards and Technology Special Publication, 1995.
  • 4[4]Baker L D,McCallum A K. Distributional clustering of words for text classification [A]. Proc. ACM-SIGIR-98[C]. Australia: ACM Press, 1998. 96 - 103.
  • 5[5]Park H,Howland P,Jeon M. Cluster structure preserving dimension reduction based on the generalized singular value decompositon [J]. SIAM Journal on Matrix Analysis and Applications ,2003,25 (1): 165 - 179.
  • 6[6]Wold H. Encyclopedia of Statistical Science [M]. New York: Wiley, 1985.
  • 7[7]Tenenhaus M. La Régreesion PLS. Théorie et Pratique [M]. Paris: éditions Technip, 1998.

同被引文献405

引证文献27

二级引证文献220

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部