期刊文献+

结合LSA的中文谱聚类算法研究 被引量:3

Research of Chinese spectral clustering with LSA
下载PDF
导出
摘要 传统的文本谱聚类需要的文本相似矩阵依赖于向量空间模型,忽略了词与词之间的语义关系,存在词频维数过高、计算代价高等问题。针对这些问题,提出了一种基于潜在语义分析(latent semantic analysis,LSA)的文本相似矩阵构造方法,利用奇异值分解(singular value decomposition,SVD)降维,在低维的语义空间表示文本,以此来提高同类文本间的语义相似度,并进行了相关对比实验。在该实验中,改进方法的聚类效果要好于传统的方法,从而验证了改进方法的有效性和可行性。 Traditional text samples similarity matrix for spectral cluster heavily rely on the vector space model which ignores the semantic relationship among terms. It will give rise to problems such as curse of dimensionality, feature redundancy and high computing cost. To solve the problems above, this paper proposed a new method based on LSA to solve it, which used SVD to lowering rank of matrices. The experimental results turn out that the new method enhances the cluster accuracy and less the data-process elapsed time.
出处 《计算机应用研究》 CSCD 北大核心 2010年第3期917-918,共2页 Application Research of Computers
关键词 文本聚类 潜在语义分析 奇异值分解 谱聚类 text clustering LSA SVD spectral cluster
  • 相关文献

参考文献5

  • 1HAN J, KAMBER M. Data mining: concept and techniques[ M]. San Fransisco: Morgan Kaufmann, 2001.
  • 2蔡晓妍,戴冠中,杨黎斌.谱聚类算法综述[J].计算机科学,2008,35(7):14-18. 被引量:186
  • 3戴新宇,田宝明,周俊生,陈家骏.一种基于潜在语义分析和直推式谱图算法的文本分类方法LSASGT[J].电子学报,2008,36(8):1626-1630. 被引量:7
  • 4DEERWESTER S,DUMAIS S, FUMAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990,41 (6) :391-407.
  • 5SHI Jian-bo, MALIK J. Normalized cuts and image segmentation[J].IEEE Trans on Pattern Analysis and Machine Intelligence, 2000,22(8) :888-905.

二级参考文献46

  • 1Jain A, Murty M, Flynn P. Data clustering.. A Review[J]. ACM Computing Surveys, 1999,31 (3) : 264-323.
  • 2Fiedler M. Algebraic connectivity of graphs. Czech, Math. J. , 1973,23: 298-305.
  • 3Malik J,Belongie S,Leung T, et al. Contour and texture analysis for image segmentation In Perceptual Organization for Artificial Vision Systems. Kluwer, 2000.
  • 4Weiss Y. Segmentation using eigenvectors: A unified view//International Conference on Computer Vision 1999.
  • 5Shi J,Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22 (8) : 888-905.
  • 6Wu Z, Leahy R. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation [J]. IEEE Trans on PAMI,1993, 15(11):1101-1113.
  • 7Hagen L, Kahng A 13. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Computer-Aided Design, 1992,11 (9) : 1074-1085.
  • 8Sarkar S, Soundararajan P. Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2000,22(5) : 504- 525.
  • 9Ding C, He X, Zha H, et al. Spectral Min Max cut for Graph Partitioning and Data Clustering[C]//Proc. of the IEEE Intl Conf. on Data Mining. 2001 : 107-114.
  • 10Meila M , Xu L. Multiway cuts and spectral clustering. U. Washington Tech Report. 2003.

共引文献191

同被引文献32

  • 1郭景峰,赵玉艳,边伟峰,李晶.基于改进的凝聚性和分离性的层次聚类算法[J].计算机研究与发展,2008,45(z1):202-206. 被引量:15
  • 2李良俊,张斌,杨明.基于LSA降维的KNN文本分类算法[J].东北师大学报(自然科学版),2007,39(2):33-36. 被引量:7
  • 3王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(8):1577-1581. 被引量:61
  • 4FIEDLER M. Algebraic connectivity of graphs [ J ]. Czechoslovak Mathematical Journal, 1973,23(98) :298-305.
  • 5HENDRICKSON B,LELAND R. An improved spectral graph partitioning algorithm for mapping parallel computations [ J ] . SIAM Journal on Scientific Computing,l995,16(2) -452-469.
  • 6HAGEN L, KAHNG A B. New spectral methods for ratio cut partitioning and clustering [ J ]. IEEE Trans Computer-Aided Design, 1992,11(9) : 1074-1085.
  • 7SHI J, MALIK J. Normalized cuts and image segmentation[ J]. IEEE Trans on Pattern Analysis and Machine Intelligence,2000,22(8) :888-905.
  • 8DHILLON I S. Co-clustering documents and words using bipartite spectral graph partitioning[ C]//Proc of the 7th ACM SIGKDD Internationa) Conference on Knowledge Discovery and Data Mining. New York:ACM,2001:269-274.
  • 9DHILLON I S, GUAN Y, KULIS B, Weighted graph cuts without eigenvectors : a multilevel approach [ J ]. IEEE Trans on Pattern Analysis and Machine Intelligence,2007,29( 11) : 1944-1957.
  • 10DING C,HE Xiao-feng, ZHA Hang-yuan, et al. A min-max cut al-gorithm for graph partitioning and data clustering [ C ]//Proo of IC-DM.2001 :107-114.

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部