Skip to yearly menu bar Skip to main content


Spotlight

Learning a Latent Simplex in Input Sparsity Time

Ainesh Bakshi · Chiranjib Bhattacharyya · Ravi Kannan · David Woodruff · Samson Zhou

Abstract: We consider the problem of learning a latent k-vertex simplex KRd, given ARd×n, which can be viewed as n data points that are formed by randomly perturbing some latent points in K, possibly beyond K. A large class of latent variable models, such as adversarial clustering, mixed membership stochastic block models, and topic models can be cast in this view of learning a latent simplex. Bhattacharyya and Kannan (SODA 2020) give an algorithm for learning such a k-vertex latent simplex in time roughly O(knnz(A)), where nnz(A) is the number of non-zeros in A. We show that the dependence on k in the running time is unnecessary given a natural assumption about the mass of the top k singular values of A, which holds in many of these applications. Further, we show this assumption is necessary, as otherwise an algorithm for learning a latent simplex would imply a better low rank approximation algorithm than what is known. We obtain a spectral low-rank approximation to A in input-sparsity time and show that the column space thus obtained has small sinΘ (angular) distance to the right top-k singular space of A. Our algorithm then selects k points in the low-rank subspace with the largest inner product (in absolute value) with k carefully chosen random vectors. By working in the low-rank subspace, we avoid reading the entire matrix in each iteration and thus circumvent the Θ(knnz(A)) running time.

Chat is not available.