Constrained clustering,such as k-means with instance-level Must-Link(ML)and Cannot-Link(CL)auxiliary information as the constraints,has been extensively studied recently,due to its broad applications in data science a...Constrained clustering,such as k-means with instance-level Must-Link(ML)and Cannot-Link(CL)auxiliary information as the constraints,has been extensively studied recently,due to its broad applications in data science and AI.Despite some heuristic approaches,there has not been any algorithm providing a non-trivial approximation ratio to the constrained k-means problem.To address this issue,we propose an algorithm with a provable approximation ratio of O(logk)when only ML constraints are considered.We also empirically evaluate the performance of our algorithm on real-world datasets having artificial ML and disjoint CL constraints.The experimental results show that our algorithm outperforms the existing greedy-based heuristic methods in clustering accuracy.展开更多
Purpose–Constrained clustering is an important recent development in clustering literature.The goal of an algorithm in constrained clustering research is to improve the quality of clustering by making use of backgrou...Purpose–Constrained clustering is an important recent development in clustering literature.The goal of an algorithm in constrained clustering research is to improve the quality of clustering by making use of background knowledge.The purpose of this paper is to suggest a new perspective for constrained clustering,by finding an effective transformation of data into target space on the reference of background knowledge given in the form of pairwise must-and cannot-link constraints.Design/methodology/approach–Most of existing methods in constrained clustering are limited to learn a distance metric or kernel matrix from the background knowledge while looking for transformation of data in target space.Unlike previous efforts,the author presents a non-linear method for constraint clustering,whose basic idea is to use different non-linear functions for each dimension in target space.Findings–The outcome of the paper is a novel non-linear method for constrained clustering which uses different non-linearfunctions for each dimension in target space.The proposed method for a particular case is formulated and explained for quadratic functions.To reduce the number of optimization parameters,the proposed method is modified to relax the quadratic function and approximate it by a factorized version that is easier to solve.Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.Originality/value–This study proposes a new direction to the problem of constrained clustering by learning a non-linear transformation of data into target space without using kernel functions.This work will assist researchers to start development of new methods based on the proposed framework which will potentially provide them with new research topics.展开更多
基金This work was supported by the National Natural Science Foundation of China(Nos.12271098 and 61772005)the Outstanding Youth Innovation Team Project for Universities of Shandong Province(No.2020KJN008)。
文摘Constrained clustering,such as k-means with instance-level Must-Link(ML)and Cannot-Link(CL)auxiliary information as the constraints,has been extensively studied recently,due to its broad applications in data science and AI.Despite some heuristic approaches,there has not been any algorithm providing a non-trivial approximation ratio to the constrained k-means problem.To address this issue,we propose an algorithm with a provable approximation ratio of O(logk)when only ML constraints are considered.We also empirically evaluate the performance of our algorithm on real-world datasets having artificial ML and disjoint CL constraints.The experimental results show that our algorithm outperforms the existing greedy-based heuristic methods in clustering accuracy.
文摘Purpose–Constrained clustering is an important recent development in clustering literature.The goal of an algorithm in constrained clustering research is to improve the quality of clustering by making use of background knowledge.The purpose of this paper is to suggest a new perspective for constrained clustering,by finding an effective transformation of data into target space on the reference of background knowledge given in the form of pairwise must-and cannot-link constraints.Design/methodology/approach–Most of existing methods in constrained clustering are limited to learn a distance metric or kernel matrix from the background knowledge while looking for transformation of data in target space.Unlike previous efforts,the author presents a non-linear method for constraint clustering,whose basic idea is to use different non-linear functions for each dimension in target space.Findings–The outcome of the paper is a novel non-linear method for constrained clustering which uses different non-linearfunctions for each dimension in target space.The proposed method for a particular case is formulated and explained for quadratic functions.To reduce the number of optimization parameters,the proposed method is modified to relax the quadratic function and approximate it by a factorized version that is easier to solve.Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.Originality/value–This study proposes a new direction to the problem of constrained clustering by learning a non-linear transformation of data into target space without using kernel functions.This work will assist researchers to start development of new methods based on the proposed framework which will potentially provide them with new research topics.