摘要
大数据时代的到来使Python语言受到越来越多的关注。在国际上,IEEE颁布的顶级编程语言交互排行榜中,Python已连续多年名列榜首,在国内,Python已经进入义务教育阶段小学课程。Python以其可读性强、使用范围广受到越来越多计算机使用人员的欢迎。Python在数据处理方面光彩夺目的表现得益于和其他过程控制语言的巨大不同,本文以经典K-means算法的实现为切入点,通过不同的编程方式实现同样的聚类过程,在UCI和生成数据集上分别运行不同程序,发现采用Numpy数据处理库可以显著提升程序运行效率,减少运行时间,展现出Python向量式数据计算的巨大优势。
With the advent of the big data era,python language has attracted more and more attention.Internationally,in the top programming language interaction ranking released by IEEE,python has been ranked first for many years.In China,python has entered primary school.Python is widely used by more and more computer users because of its readability.However,Python's advantages in data processing are also shown out of the huge differences of other process control languages.This paper takes the implementation of the classic k-means algorithm as an examples,program the same clustering process by different programming methods,run the program on the UCI and generating data set respectively,we found that using numpy data processing library can significantly improve the running efficiency of the program and reduce the running time,so then show the huge advantages of Python vector data computing.
作者
王习涛
WANG Xi-tao(Statistics Bureau Data Management Center of Henan Province,Henan Zhengzhou 410018)
出处
《软件》
2020年第8期87-88,128,共3页
Software