摘要
K-Means算法在处理大规模异构数据时,通常使用欧氏距离来衡量数据点之间的相似度,然而这样存在效率低下以及计算复杂性过高的问题。受到汉明距离在处理数据相似性计算上存在显著优势的启发,提出一种基于汉明距离的量子K-Means(QKMH)算法来计算相似度。首先,将数据制备成量子态,并使用量子汉明距离计算待聚类点和K个聚类中心之间的相似度;然后,改进了Grover最小值搜索算法查找距离待聚类点最近的聚类中心;最后,循环以上步骤,直到达到规定迭代次数或者聚类中心不再改变。基于量子模拟计算框架QisKit,将提出的算法在MNIST手写数字数据集上进行了验证并与传统和改进的多种方法进行了对比,实验结果表明,QKMH算法的F1值相较于基于曼哈顿距离的量子K-Means算法提高了10个百分点,相较于最新优化的基于欧氏距离的量子K-Means算法提高了4.6个百分点;同时经计算,QKMH算法时间复杂度比上述对比算法更低。
The K-Means algorithms typically utilize Euclidean distance to calculate the similarity between data points when dealing with large-scale heterogeneous data.However,this method has problems of low efficiency and high computational complexity.Inspired by the significant advantage of Hamming distance in handling data similarity calculation,a Quantum K-Means Hamming(QKMH) algorithm was proposed to calculate similarity.First,the data was prepared and made into quantum state,and the quantum Hamming distance was used to calculate similarity between the points to be clustered and the K cluster centers.Then,the Grover's minimum search algorithm was improved to find the cluster center closest to the points to be clustered.Finally,these steps were repeated until the designated number of iterations was reached or the clustering centers no longer changed.Based on the quantum simulation computing framework QisKit,the proposed algorithm was validated on the MNIST handwritten digit dataset and compared with various traditional and improved methods.Experimental results show that the F1 score of the QKMH algorithm is improved by 10 percentage points compared with that of the Manhattan distance-based quantum K-Means algorithm and by 4.6 percentage points compared with that of the latest optimized Euclidean distance-based quantum K-Means algorithm,and the time complexity of the QKMH algorithm is lower than those of the above comparison algorithms.
作者
钟静
林晨
盛志伟
张仕斌
ZHONG Jing;LIN Chen;SHENG Zhiwei;ZHANG Shibin(School of Cybersecurity,Chengdu University of Information Technology,Chengdu Sichuan 610225,China)
出处
《计算机应用》
CSCD
北大核心
2023年第8期2493-2498,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(62076042)。