期刊文献+

基于LSA模型的改进密度峰值算法的微学习单元文本聚类研究 被引量:5

An improved density peak algorithm for micro-learning unit text clustering based on LSA model
下载PDF
导出
摘要 微学习资源爆炸式的增长带来了大量未经组织处理的文本资源,大量以碎片化形式呈现的微学习资源为学习者的使用带来极大的不便。为让学习者能在碎片化的资源中找到适合于个性化学习的内容,对以文本形式的微学习资源进行聚类是很有必要的。为此,尝试将经过改进的密度峰值算法应用于微学习单元文本聚类。针对密度峰值算法在该领域聚类时存在向量空间高维稀疏、全局一致性不足、截断距离敏感、选择密度峰值中心需要人工监督等问题,使用潜在语义分析模型(LSA)建模,并提出2点改进:其一,针对聚类要求重新定义局部密度,并引入密度敏感距离作为聚类的判据,通过解决截断距离敏感性问题来解决聚类分配时全局一致性问题;其二,用线性拟合寻找野值点来自动寻找密度峰值中心,以实现非人工监督的峰值中心选取问题。微学习单元真实数据集上的实验验证结果表明,本文所提算法比原密度峰值算法以及其他经典聚类算法更适合于微学习单元文本聚类。 With the explosive growth of micro-learning resources, a large number of unprocessed fragmented text resources bring great inconvenience to learners. In order to help learners to find suitable contents from fragmented resources for personalized learning, it is necessary to cluster micro-learning resources in the form of text. Therefore, this paper attempts to apply an improved density peak algorithm to micro-learning unit text clustering. Aiming at the problems of high dimensional sparse vector space, insufficient global consistency, cutoff distance sensitivity, and supervised selection of density peak centers when the density peak algorithm perform clustering in its field, this paper proposes two approaches based on Latent Semantic Analysis(LSA) model. Firstly, a new definition of local density is proposed according to clustering requirements, density sensitive distance is used as the clustering criteria, and the global consistency problem of clustering is solved by solving the problem of cutoff distance sensitivity. Secondly, outliers are found by linear fitting to automatically find the density peak centers in order to realize unsupervised selection problem of peak centers. Experimental results on real data sets of micro-learning units show that the proposal is more suitable for text clustering of micro-learning units than the original algorithm and other classical clustering algorithms.
作者 武国胜 张月琴 WU Guo-sheng;ZHANG Yue-qin(College of Information and Computer Science,Taiyuan University of Technology,Jinzhong 030600,China)
出处 《计算机工程与科学》 CSCD 北大核心 2020年第4期722-732,共11页 Computer Engineering & Science
基金 山西省自然科学基金(201701D121057)。
关键词 微学习 文本聚类 密度聚类 LSA 密度敏感距离 线性拟合 micro-learning text clustering density-based clustering LSA density-sensitive distance linear fitting
  • 相关文献

参考文献3

二级参考文献46

  • 1王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(8):1577-1581. 被引量:61
  • 2Yu SX, Shi J. Segmentation given partial grouping constraints. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004, 26(2): 173-183.
  • 3Hertz T, Shental N, Bar-Hillel A, Weinshall D. Enhancing image and video retrieval: Learning via equivalence constraint. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Madison: IEEE Computer Society, 2003.668-674.
  • 4Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Brodley CE, Danyluk AP, eds. Proc. of the 18th Int'l Conf. on Machine Learning. Williamstown: Morgan Kaufmann Publishers, 2001. 577-584.
  • 5Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Sammut C, Hoffmann AG, eds. Proc. of the 19th Int'l Conf. on Machine Learning. Sydney: Morgan Kaufmann Publishers, 2002. 307-314.
  • 6Wagstaff K, Cardie C. Clustering with instance-level constraints. In: Langley P, ed. Proc. of the 17th Int'l Conf. on Machine Learning. Morgan Kaufmann Publishers, 2000. 1103-1110.
  • 7Zhou D, Bousquet O, Lal TN, Weston J, Scholkopf B. Learning with local and global consistency. In: Thrun S, Saul L, SchSlkopf B, eds. Advances in Neural Information Processing Systems 16. Cambridge: MIT Press, 2004. 321-328.
  • 8Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(8): 888-905.
  • 9Gu M, Zha H, Ding C, He X, Simon H. Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. Technical Report, CSE-01-007, Penn State University, 2001.
  • 10Ng AY, Jordan MI, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z, eds. Advances in Neural Information Processing Systems (NIPS) 14. Cambridge: MIT Press, 2002, 894-856.

共引文献126

同被引文献51

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部