期刊文献+

DBSCAN优化算法在实验文本大数据分析中的应用研究 被引量:1

Application and Research of DBSCAN Optimization Algorithm in Big Data Analysis of Experimental Text
下载PDF
导出
摘要 大数据是近年来计算机领域兴起的研究热点,通过聚类可以解决诸如数据挖掘、机器学习、文本处理等大数据领域问题。针对传统的DBSCAN算法参数需要人工设定,且算法速度无法适应大数据应用等问题,本文提出了一种DBSCAN优化算法。利用KD树加快查找邻域对象,显著减少算法的运行时间;同时,通过计算所有邻域对象的数学期望,实现密度阈值(Minpts)参数自适应;接着,设计了一种文本聚类流程,通过SD-TF-IDF算法对特征项的权值进行优化,进而完成对文本的聚类任务;最后,将其应用于高校计算机实验文本大数据的挖掘分析中,取得了良好的效果。 Big data is a research hotspot emerging in the computer field in recent years. Clustering can solve problems in the field of big data, such as data mining, machine learning, and text processing. Aiming at the problems that parameters of traditional DBSCAN algorithm need to be set manually and the algorithm speed cannot adapt to the application of big data, a DBSCAN optimization algorithm was proposed. The KD tree was used to speed up the search for neighborhood objects, significantly reducing the running time of the algorithm;at the same time, the density threshold (Minpts) was adaptive by calculating the mathematical expectations of all neighborhood objects;then, a text clustering process was designed, and the weights of feature items were optimized through SD-TF-IDF to complete the text clustering task;finally, it was applied to the mining and analysis of big data of computer experimental text in colleges and universities, and good results had been achieved.
出处 《计算机科学与应用》 2020年第5期906-913,共8页 Computer Science and Application
基金 教育部科技发展中心高校产学研创新基金——新一代信息技术创新项目(2018A01015),教育部科技发展中心高校产学研创新基金——新一代信息技术创新项目(2018A02027),国家自然科学基金项目(61871475,61471133).
  • 相关文献

参考文献9

二级参考文献92

  • 1谭勇,荣秋生.一个基于DBSCAN聚类算法的实现[J].计算机工程,2004,30(13):119-121. 被引量:7
  • 2单世民,邓贵仕,何英昊.一种基于网格和密度的微粒群混合聚类算法[J].计算机科学,2006,33(11):164-165. 被引量:3
  • 3刘敏娟,柴玉梅,张西芝.基于相似度的网格聚类算法[J].计算机工程与应用,2007,43(7):198-201. 被引量:12
  • 4孙玉芬,卢炎生.一种基于网格方法的高维数据流子空间聚类算法[J].计算机科学,2007,34(4):199-203. 被引量:8
  • 5Ohsawa Y, Nara Y. Decision process modeling across internet and real world by double helical model for chance discovery [J]. New Generation Computing, 2003, 21(2) : 109-121.
  • 6Shu B, Kak S. A neural network-based intelligent metasearch engine [J]. Information Sciences, 1999,120(1) :1-11.
  • 7ChiaHui, Chang Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW [J]. IEEE Trans. on Knowledge and Data Engineering, 1999,11 (4): 595-609.
  • 8Arwar. hem-Based collaborative filtering recommendation algorithms [A], Ins Proc. of the 10tb Intl. World Wide Web Conf.(WWW10) [C], 2001. 285-295.
  • 9Jain A,Dubes R C. Algorithms for Clustering Data [M]. Prentice Hall, 1988.
  • 10Karypis G, Han E,Kurnar V. Chameleon: A hierarchical clustering algorithm using dynamic modeling [J]. IEEE Computer,1999, 32(8) :68-75.

共引文献100

同被引文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部