The clustering problem of big data in the era of artificial intelligence has been widely studied.Because of the huge amount of data,distributed algorithms are often used to deal with big data problems.The distributed ...The clustering problem of big data in the era of artificial intelligence has been widely studied.Because of the huge amount of data,distributed algorithms are often used to deal with big data problems.The distributed computing model has an attractive feature:it can handle massive datasets that cannot be put into the main memory.On the other hand,since many decisions are made automatically by machines in today’s society,algorithm fairness is also an important research area of machine learning.In this paper,we study two fair clustering problems:the centralized fair k-center problem with outliers and the distributed fair k-center problem with outliers.For these two problems,we have designed corresponding constant approximation ratio algorithms.The theoretical proof and analysis of the approximation ratio,and the running space of the algorithm are given.展开更多
为了充分利用实际高速公路路段交通拥堵信息,更合理地聚类交通拥堵的内在规律和特征变化,提出自适应确定聚类中心C和类别K值(adaptive center and K-means value,ACK-Means)的聚类算法,进行高速公路拥堵路段聚类。ACK-Means算法借助簇...为了充分利用实际高速公路路段交通拥堵信息,更合理地聚类交通拥堵的内在规律和特征变化,提出自适应确定聚类中心C和类别K值(adaptive center and K-means value,ACK-Means)的聚类算法,进行高速公路拥堵路段聚类。ACK-Means算法借助簇类密度、簇类间距以及簇类强度,同时又考虑到数据样本的偶然性,对离群点进行合理分配,ACK-Means算法可实现自适应确定聚类中心C和类别K值。基于实际交通拥堵信息构建数据集,Python编程实现高速公路拥堵路段ACK-Means聚类,巧妙解决了高速公路拥堵路段聚类数目K和聚类中心C设定问题。聚类结果表明,ACK-Means算法实现高速公路拥堵路段无监督聚类,聚类结果完全基于实际的高速公路交通拥堵信息,具有更高的实用性。展开更多
针对K-means算法进行大跨屋盖结构表面风荷载分区中存在的分类数k值需凭经验事先给定以及所有初始聚类中心均需随机选取带来的分类情况数过多、从中寻找最优分类结果工作量大且效率低的问题,提出基于改进K-means算法的大跨屋盖结构表面...针对K-means算法进行大跨屋盖结构表面风荷载分区中存在的分类数k值需凭经验事先给定以及所有初始聚类中心均需随机选取带来的分类情况数过多、从中寻找最优分类结果工作量大且效率低的问题,提出基于改进K-means算法的大跨屋盖结构表面风荷载分区方法。首先,建立分类数k与其相应测点风荷载的误差平方和(Sum of the Squared Errors:SSE)关系曲线,引入手肘法基本思想,实现最优分类数kst值的精准识别;其次,在首个初始聚类中心随机选取基础上,引入轮盘法基本思想,完成对剩余初始聚类中心的高效选取;然后,根据类内紧凑、类间分散的原则,通过类内紧凑性判定指标S(k)和类间分散性判定指标D(k),构造并借助SD(k)值有效性检验,得到最优的风荷载分区结果;最后,以北京奥林匹克网球中心大跨悬挑屋盖结构为例,针对风洞试验所得风荷载测试结果,采用所提方法对其表面最不利风压系数进行分区计算,并与传统K-means算法进行对比,结果表明,所提方法能够高效实现大跨屋盖结构表面风压分区计算,具有较好的工程应用价值。展开更多
基金This work was supported by the National Natural Science Foundation of China(Nos.12131003,11771386,and 11728104)the Beijing Natural Science Foundadtion Project(No.Z200002)+2 种基金the General Research Projects of Beijing Educations Committee in China(No.KM201910005013)the Natural Sciences and Engineering Research Council of Canada(NSERC)(No.06446)the General Program of Science and Technology Development Project of Beijing Municipal Education Commission(No.KM201810005005).
文摘The clustering problem of big data in the era of artificial intelligence has been widely studied.Because of the huge amount of data,distributed algorithms are often used to deal with big data problems.The distributed computing model has an attractive feature:it can handle massive datasets that cannot be put into the main memory.On the other hand,since many decisions are made automatically by machines in today’s society,algorithm fairness is also an important research area of machine learning.In this paper,we study two fair clustering problems:the centralized fair k-center problem with outliers and the distributed fair k-center problem with outliers.For these two problems,we have designed corresponding constant approximation ratio algorithms.The theoretical proof and analysis of the approximation ratio,and the running space of the algorithm are given.
文摘为了充分利用实际高速公路路段交通拥堵信息,更合理地聚类交通拥堵的内在规律和特征变化,提出自适应确定聚类中心C和类别K值(adaptive center and K-means value,ACK-Means)的聚类算法,进行高速公路拥堵路段聚类。ACK-Means算法借助簇类密度、簇类间距以及簇类强度,同时又考虑到数据样本的偶然性,对离群点进行合理分配,ACK-Means算法可实现自适应确定聚类中心C和类别K值。基于实际交通拥堵信息构建数据集,Python编程实现高速公路拥堵路段ACK-Means聚类,巧妙解决了高速公路拥堵路段聚类数目K和聚类中心C设定问题。聚类结果表明,ACK-Means算法实现高速公路拥堵路段无监督聚类,聚类结果完全基于实际的高速公路交通拥堵信息,具有更高的实用性。
文摘针对K-means算法进行大跨屋盖结构表面风荷载分区中存在的分类数k值需凭经验事先给定以及所有初始聚类中心均需随机选取带来的分类情况数过多、从中寻找最优分类结果工作量大且效率低的问题,提出基于改进K-means算法的大跨屋盖结构表面风荷载分区方法。首先,建立分类数k与其相应测点风荷载的误差平方和(Sum of the Squared Errors:SSE)关系曲线,引入手肘法基本思想,实现最优分类数kst值的精准识别;其次,在首个初始聚类中心随机选取基础上,引入轮盘法基本思想,完成对剩余初始聚类中心的高效选取;然后,根据类内紧凑、类间分散的原则,通过类内紧凑性判定指标S(k)和类间分散性判定指标D(k),构造并借助SD(k)值有效性检验,得到最优的风荷载分区结果;最后,以北京奥林匹克网球中心大跨悬挑屋盖结构为例,针对风洞试验所得风荷载测试结果,采用所提方法对其表面最不利风压系数进行分区计算,并与传统K-means算法进行对比,结果表明,所提方法能够高效实现大跨屋盖结构表面风压分区计算,具有较好的工程应用价值。