期刊文献+

基于LSI和SVM相结合的文本分类研究 被引量:1

Research on text classification based on combining LSI with SVM
下载PDF
导出
摘要 传统的向量空间模型使用关键词来表示文本,但没有考虑关键词的一词多义和多词一义问题。为了解决该问题,提出了一种潜在语义索引和支持向量机相结合的文本分类方法,使用替在语义索引方法获得原始特征向量的潜在语义结构。实验结果表明,该方法同单独使用支持向量机的方法相比,分类准确率有小幅度的下降,但特征向量获得了大幅度的降维。 In traditional vector space modal, key words are used to represent the text, but the problems ofpolysemy and synonymy are not taken into account. To solve the problem, a text classification method combining latent semantic indexing with support vector machine is presented, using latent semantic indexing to obtain latent semantic structure of original feature vector. The experimental result shows that comparing to using the SVM solely, the dimension of feature vector drops largely with the accuracy of this method dropping a little.
作者 刘洋 张秋余
出处 《计算机工程与设计》 CSCD 北大核心 2007年第23期5762-5764,共3页 Computer Engineering and Design
基金 甘肃省科技攻关计划基金项目(2GS047-A52-002-03)
关键词 潜在语义索引 奇异值分解 支持向量机 文本分类 机器学习 latent semantic indexing single value decomposition support vector machine text classification machine learning
  • 相关文献

参考文献10

  • 1Sebastiani F. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 2Yang Y, Lin X. A re-examination of text categorization methods [C]. Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley: ACM Press, 1999,42-49.
  • 3李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 4曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 5陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 6Moschitti A, Basili R. Complex linguistic features for text classification: A comprehensive study [C]. Proceedings of the 26th european Conference on Information Retrieval Research. Sunderland: Springer-Verlag, 2004:181-196.
  • 7Shima K, Todoriki M, Suzuki A. SVM-based feature selection of latent semantic features[J]. Pattern Recognition Letters,2004,25 (2):1051-1057.
  • 8Kim H, Howland P, Park H. Dimension reduction in text classification with support vector machine [J]. Journal of Machine Learning Research,2005,6(1):37-53.
  • 9Liu T, Chen Z, Zhang B, et al. Improving text classification using local latent semantic indexing[C]. Proceedings of the 4th IEEE International Conference on Data Mining. Brighton: IEEE Computer Society, 2004,162-169.
  • 10苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:378

二级参考文献51

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 3[1]Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Survey, 2002,34 (1):1 -47.
  • 4[2]Deerwester S,Dumais S T,Furnas G W,et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990,41 (6) :391 - 407.
  • 5[3]Dumais S T. Using LSI for information filtering [A].Harman D. The Third Text Retrieval Conference ( TREC - 3) [C]. USA: National Institute of Standards and Technology Special Publication, 1995.
  • 6[4]Baker L D,McCallum A K. Distributional clustering of words for text classification [A]. Proc. ACM-SIGIR-98[C]. Australia: ACM Press, 1998. 96 - 103.
  • 7[5]Park H,Howland P,Jeon M. Cluster structure preserving dimension reduction based on the generalized singular value decompositon [J]. SIAM Journal on Matrix Analysis and Applications ,2003,25 (1): 165 - 179.
  • 8[6]Wold H. Encyclopedia of Statistical Science [M]. New York: Wiley, 1985.
  • 9[7]Tenenhaus M. La Régreesion PLS. Théorie et Pratique [M]. Paris: éditions Technip, 1998.
  • 10D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998, 4-15.

共引文献546

同被引文献9

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部