The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial s...The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial seeds, particularly in complex datasets or datasets with non-spherical clusters. In this paper, a Comprehensive K-Means Clustering algorithm is presented, in which multiple trials of k-means are performed on a given dataset. The clustering results from each trial are transformed into a five-dimensional data point, containing the scope values of the x and y coordinates of the clusters along with the number of points within that cluster. A graph is then generated displaying the configuration of these points using Principal Component Analysis (PCA), from which we can observe and determine the common clustering patterns in the dataset. The robustness and strength of these patterns are then examined by observing the variance of the results of each trial, wherein a different subset of the data keeping a certain percentage of original data points is clustered. By aggregating information from multiple trials, we can distinguish clusters that consistently emerge across different runs from those that are more sensitive or unlikely, hence deriving more reliable conclusions about the underlying structure of complex datasets. Our experiments show that our algorithm is able to find the most common associations between different dimensions of data over multiple trials, often more accurately than other algorithms, as well as measure stability of these clusters, an ability that other k-means algorithms lack.展开更多
Gobi spans a large area of China,surpassing the combined expanse of mobile dunes and semi-fixed dunes.Its presence significantly influences the movement of sand and dust.However,the complex origins and diverse materia...Gobi spans a large area of China,surpassing the combined expanse of mobile dunes and semi-fixed dunes.Its presence significantly influences the movement of sand and dust.However,the complex origins and diverse materials constituting the Gobi result in notable differences in saltation processes across various Gobi surfaces.It is challenging to describe these processes according to a uniform morphology.Therefore,it becomes imperative to articulate surface characteristics through parameters such as the three-dimensional(3D)size and shape of gravel.Collecting morphology information for Gobi gravels is essential for studying its genesis and sand saltation.To enhance the efficiency and information yield of gravel parameter measurements,this study conducted field experiments in the Gobi region across Dunhuang City,Guazhou County,and Yumen City(administrated by Jiuquan City),Gansu Province,China in March 2023.A research framework and methodology for measuring 3D parameters of gravel using point cloud were developed,alongside improved calculation formulas for 3D parameters including gravel grain size,volume,flatness,roundness,sphericity,and equivalent grain size.Leveraging multi-view geometry technology for 3D reconstruction allowed for establishing an optimal data acquisition scheme characterized by high point cloud reconstruction efficiency and clear quality.Additionally,the proposed methodology incorporated point cloud clustering,segmentation,and filtering techniques to isolate individual gravel point clouds.Advanced point cloud algorithms,including the Oriented Bounding Box(OBB),point cloud slicing method,and point cloud triangulation,were then deployed to calculate the 3D parameters of individual gravels.These systematic processes allow precise and detailed characterization of individual gravels.For gravel grain size and volume,the correlation coefficients between point cloud and manual measurements all exceeded 0.9000,confirming the feasibility of the proposed methodology for measuring 3D parameters of individual gravels.The proposed workflow yields accurate calculations of relevant parameters for Gobi gravels,providing essential data support for subsequent studies on Gobi environments.展开更多
The COVID-19 pandemic has caused an unprecedented spike in confirmed cases in 230 countries globally. In this work, a set of data from the COVID-19 coronavirus outbreak has been subjected to two well-known unsupervise...The COVID-19 pandemic has caused an unprecedented spike in confirmed cases in 230 countries globally. In this work, a set of data from the COVID-19 coronavirus outbreak has been subjected to two well-known unsupervised learning techniques: K-means clustering and correlation. The COVID-19 virus has infected several nations, and K-means automatically looks for undiscovered clusters of those infections. To examine the spread of COVID-19 before a vaccine becomes widely available, this work has used unsupervised approaches to identify the crucial county-level confirmed cases, death cases, recover cases, total_cases_per_million, and total_deaths_per_million aspects of county-level variables. We combined countries into significant clusters using this feature subspace to assist more in-depth disease analysis efforts. As a result, we used a clustering technique to examine various trends in COVID-19 incidence and mortality across nations. This technique took the key components of a trajectory and incorporates them into a K-means clustering process. We separated the trend lines into measures that characterize various features of a trend. The measurements were first reduced in dimension, then clustered using a K-means algorithm. This method was used to individually calculate the incidence and death rates and then compare them.展开更多
A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive...A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.展开更多
针对电池储能系统(battery energy storage system,BESS)进行光伏波动平抑时寿命损耗高及荷电状态(state of charge,SOC)一致性差的问题,提出了光伏波动平抑下改进K-means的BESS动态分组控制策略。首先,采用最小最大调度方法获取光伏并...针对电池储能系统(battery energy storage system,BESS)进行光伏波动平抑时寿命损耗高及荷电状态(state of charge,SOC)一致性差的问题,提出了光伏波动平抑下改进K-means的BESS动态分组控制策略。首先,采用最小最大调度方法获取光伏并网指令。其次,设计了改进侏儒猫鼬优化算法(improved dwarf mongoose optimizer,IDMO),并利用它对传统K-means聚类算法进行改进,加快了聚类速度。接着,制定了电池单元动态分组原则,并根据电池单元SOC利用改进K-means将其分为3个电池组。然后,设计了基于充放电函数的电池单元SOC一致性功率分配方法,并据此提出BESS双层功率分配策略,上层确定电池组充放电顺序及指令,下层计算电池单元充放电指令。对所提策略进行仿真验证,结果表明,所设计的IDMO具有更高的寻优精度及更快的寻优速度。所提BESS平抑光伏波动策略在有效平抑波动的同时,降低了BESS运行寿命损耗并提高了电池单元SOC的均衡性。展开更多
为了充分利用实际高速公路路段交通拥堵信息,更合理地聚类交通拥堵的内在规律和特征变化,提出自适应确定聚类中心C和类别K值(adaptive center and K-means value,ACK-Means)的聚类算法,进行高速公路拥堵路段聚类。ACK-Means算法借助簇...为了充分利用实际高速公路路段交通拥堵信息,更合理地聚类交通拥堵的内在规律和特征变化,提出自适应确定聚类中心C和类别K值(adaptive center and K-means value,ACK-Means)的聚类算法,进行高速公路拥堵路段聚类。ACK-Means算法借助簇类密度、簇类间距以及簇类强度,同时又考虑到数据样本的偶然性,对离群点进行合理分配,ACK-Means算法可实现自适应确定聚类中心C和类别K值。基于实际交通拥堵信息构建数据集,Python编程实现高速公路拥堵路段ACK-Means聚类,巧妙解决了高速公路拥堵路段聚类数目K和聚类中心C设定问题。聚类结果表明,ACK-Means算法实现高速公路拥堵路段无监督聚类,聚类结果完全基于实际的高速公路交通拥堵信息,具有更高的实用性。展开更多
受限于自然条件,光伏出力具有很强的随机性。为准确评估轨道交通基础设施分布式光伏发电的光伏出力特性,提出一种基于改进K-means聚类算法的轨道交通基础设施分布式光伏发电典型场景生成方法,并基于此进行光伏出力特性分析。首先,基于...受限于自然条件,光伏出力具有很强的随机性。为准确评估轨道交通基础设施分布式光伏发电的光伏出力特性,提出一种基于改进K-means聚类算法的轨道交通基础设施分布式光伏发电典型场景生成方法,并基于此进行光伏出力特性分析。首先,基于分布式光伏发电设施以及气象数据,利用PVsyst软件模拟光伏发电出力数据。然后,针对基本K-means聚类算法聚类参数和初始聚类中心盲目性高的问题,结合聚类有效性指标(Density based index,DBI)和层次聚类对其进行改进并利用改进K-means聚类算法生成光伏典型日出力场景。最后,基于华中地区某地轨道交通基础设施分布式光伏系统对所提方法的有效性和优越性进行验证,并通过定性和定量分析各典型场景的出力特性揭示轨道交通基础设施分布式光伏出力的规律和特点。展开更多
文摘The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial seeds, particularly in complex datasets or datasets with non-spherical clusters. In this paper, a Comprehensive K-Means Clustering algorithm is presented, in which multiple trials of k-means are performed on a given dataset. The clustering results from each trial are transformed into a five-dimensional data point, containing the scope values of the x and y coordinates of the clusters along with the number of points within that cluster. A graph is then generated displaying the configuration of these points using Principal Component Analysis (PCA), from which we can observe and determine the common clustering patterns in the dataset. The robustness and strength of these patterns are then examined by observing the variance of the results of each trial, wherein a different subset of the data keeping a certain percentage of original data points is clustered. By aggregating information from multiple trials, we can distinguish clusters that consistently emerge across different runs from those that are more sensitive or unlikely, hence deriving more reliable conclusions about the underlying structure of complex datasets. Our experiments show that our algorithm is able to find the most common associations between different dimensions of data over multiple trials, often more accurately than other algorithms, as well as measure stability of these clusters, an ability that other k-means algorithms lack.
基金funded by the National Natural Science Foundation of China(42071014).
文摘Gobi spans a large area of China,surpassing the combined expanse of mobile dunes and semi-fixed dunes.Its presence significantly influences the movement of sand and dust.However,the complex origins and diverse materials constituting the Gobi result in notable differences in saltation processes across various Gobi surfaces.It is challenging to describe these processes according to a uniform morphology.Therefore,it becomes imperative to articulate surface characteristics through parameters such as the three-dimensional(3D)size and shape of gravel.Collecting morphology information for Gobi gravels is essential for studying its genesis and sand saltation.To enhance the efficiency and information yield of gravel parameter measurements,this study conducted field experiments in the Gobi region across Dunhuang City,Guazhou County,and Yumen City(administrated by Jiuquan City),Gansu Province,China in March 2023.A research framework and methodology for measuring 3D parameters of gravel using point cloud were developed,alongside improved calculation formulas for 3D parameters including gravel grain size,volume,flatness,roundness,sphericity,and equivalent grain size.Leveraging multi-view geometry technology for 3D reconstruction allowed for establishing an optimal data acquisition scheme characterized by high point cloud reconstruction efficiency and clear quality.Additionally,the proposed methodology incorporated point cloud clustering,segmentation,and filtering techniques to isolate individual gravel point clouds.Advanced point cloud algorithms,including the Oriented Bounding Box(OBB),point cloud slicing method,and point cloud triangulation,were then deployed to calculate the 3D parameters of individual gravels.These systematic processes allow precise and detailed characterization of individual gravels.For gravel grain size and volume,the correlation coefficients between point cloud and manual measurements all exceeded 0.9000,confirming the feasibility of the proposed methodology for measuring 3D parameters of individual gravels.The proposed workflow yields accurate calculations of relevant parameters for Gobi gravels,providing essential data support for subsequent studies on Gobi environments.
文摘The COVID-19 pandemic has caused an unprecedented spike in confirmed cases in 230 countries globally. In this work, a set of data from the COVID-19 coronavirus outbreak has been subjected to two well-known unsupervised learning techniques: K-means clustering and correlation. The COVID-19 virus has infected several nations, and K-means automatically looks for undiscovered clusters of those infections. To examine the spread of COVID-19 before a vaccine becomes widely available, this work has used unsupervised approaches to identify the crucial county-level confirmed cases, death cases, recover cases, total_cases_per_million, and total_deaths_per_million aspects of county-level variables. We combined countries into significant clusters using this feature subspace to assist more in-depth disease analysis efforts. As a result, we used a clustering technique to examine various trends in COVID-19 incidence and mortality across nations. This technique took the key components of a trajectory and incorporates them into a K-means clustering process. We separated the trend lines into measures that characterize various features of a trend. The measurements were first reduced in dimension, then clustered using a K-means algorithm. This method was used to individually calculate the incidence and death rates and then compare them.
文摘A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.
文摘针对电池储能系统(battery energy storage system,BESS)进行光伏波动平抑时寿命损耗高及荷电状态(state of charge,SOC)一致性差的问题,提出了光伏波动平抑下改进K-means的BESS动态分组控制策略。首先,采用最小最大调度方法获取光伏并网指令。其次,设计了改进侏儒猫鼬优化算法(improved dwarf mongoose optimizer,IDMO),并利用它对传统K-means聚类算法进行改进,加快了聚类速度。接着,制定了电池单元动态分组原则,并根据电池单元SOC利用改进K-means将其分为3个电池组。然后,设计了基于充放电函数的电池单元SOC一致性功率分配方法,并据此提出BESS双层功率分配策略,上层确定电池组充放电顺序及指令,下层计算电池单元充放电指令。对所提策略进行仿真验证,结果表明,所设计的IDMO具有更高的寻优精度及更快的寻优速度。所提BESS平抑光伏波动策略在有效平抑波动的同时,降低了BESS运行寿命损耗并提高了电池单元SOC的均衡性。
文摘为了充分利用实际高速公路路段交通拥堵信息,更合理地聚类交通拥堵的内在规律和特征变化,提出自适应确定聚类中心C和类别K值(adaptive center and K-means value,ACK-Means)的聚类算法,进行高速公路拥堵路段聚类。ACK-Means算法借助簇类密度、簇类间距以及簇类强度,同时又考虑到数据样本的偶然性,对离群点进行合理分配,ACK-Means算法可实现自适应确定聚类中心C和类别K值。基于实际交通拥堵信息构建数据集,Python编程实现高速公路拥堵路段ACK-Means聚类,巧妙解决了高速公路拥堵路段聚类数目K和聚类中心C设定问题。聚类结果表明,ACK-Means算法实现高速公路拥堵路段无监督聚类,聚类结果完全基于实际的高速公路交通拥堵信息,具有更高的实用性。
文摘受限于自然条件,光伏出力具有很强的随机性。为准确评估轨道交通基础设施分布式光伏发电的光伏出力特性,提出一种基于改进K-means聚类算法的轨道交通基础设施分布式光伏发电典型场景生成方法,并基于此进行光伏出力特性分析。首先,基于分布式光伏发电设施以及气象数据,利用PVsyst软件模拟光伏发电出力数据。然后,针对基本K-means聚类算法聚类参数和初始聚类中心盲目性高的问题,结合聚类有效性指标(Density based index,DBI)和层次聚类对其进行改进并利用改进K-means聚类算法生成光伏典型日出力场景。最后,基于华中地区某地轨道交通基础设施分布式光伏系统对所提方法的有效性和优越性进行验证,并通过定性和定量分析各典型场景的出力特性揭示轨道交通基础设施分布式光伏出力的规律和特点。