Skip to yearly menu bar Skip to main content

In-Person Poster presentation / poster accept

Mini-batch $k$-means terminates within $O(d/\epsilon)$ iterations

Gregory Schwartzman

MH1-2-3-4 #140

Keywords: [ Theory ]

Abstract: We answer the question: "Does \emph{local} progress (on batches) imply \emph{global} progress (on the entire dataset) for mini-batch $k$-means?". Specifically, we consider mini-batch $k$-means which terminates only when the improvement in the quality of the clustering on the sampled batch is below some threshold.Although at first glance it appears that this algorithm might execute forever, we answer the above question in the affirmative and show that if the batch is of size $\tilde{\Omega}((d/\epsilon)^2)$, it must terminate within $O(d/\epsilon)$ iterations with high probability, where $d$ is the dimension of the input, and $\epsilon$ is a threshold parameter for termination. This is true \emph{regardless} of how the centers are initialized. When the algorithm is initialized with the $k$-means++ initialization scheme, it achieves an approximation ratio of $O(\log k)$ (the same as the full-batch version). Finally, we show the applicability of our results to the mini-batch $k$-means algorithm implemented in the scikit-learn (sklearn) python library.

Chat is not available.