Skip to yearly menu bar Skip to main content


Poster

Beyond Worst-Case Dimensionality Reduction for Sparse Vectors

Sandeep Silwal · David Woodruff · Qiuyi (Richard) Zhang

Hall 3 + Hall 2B #437
[ ]
Sat 26 Apr midnight PDT — 2:30 a.m. PDT

Abstract: We study beyond worst-case dimensionality reduction for ss-sparse vectors (vectors with at most ss non-zero coordinates). Our work is divided into two parts, each focusing on a different facet of beyond worst-case analysis:\noindent (a) We first consider average-case guarantees for embedding ss-sparse vectors. Here, a well-known folklore upper bound based on the birthday-paradox states: For any collection XX of ss-sparse vectors in Rd, there exists a linear map A:RdRO(s2) which \emph{exactly} preserves the norm of 99% of the vectors in X in any p norm (as opposed to the usual setting where guarantees hold for all vectors). We provide novel lower bounds showing that this is indeed optimal in many settings. Specifically, any oblivious linear map satisfying similar average-case guarantees must map to Ω(s2) dimensions. The same lower bound also holds for a wider class of sufficiently smooth maps, including `encoder-decoder schemes', where we compare the norm of the original vector to that of a smooth function of the embedding. These lower bounds reveal a surprising separation result for smooth embeddings of sparse vectors, as an upper bound of O(slog(d)) is possible if we instead use arbitrary functions, e.g., via compressed sensing algorithms. (b) Given these lower bounds, we specialize to sparse \emph{non-negative} vectors to hopes of improved upper bounds. For a dataset X of non-negative s-sparse vectors and any p1, we can non-linearly embed X to O(slog(|X|s)/ε2) dimensions while preserving all pairwise distances in p norm up to 1±ε, with no dependence on p. Surprisingly, the non-negativity assumption enables much smaller embeddings than arbitrary sparse vectors, where the best known bound suffers an exponential (log|X|)O(p) dependence. Our map also guarantees \emph{exact} dimensionality reduction for the norm by embedding X into O(slog|X|) dimensions, which is tight. We further give separation results showing that both the non-linearity of f and the non-negativity of X are necessary, and provide downstream algorithmic improvements using our embedding.

Live content is unavailable. Log in and register to view live content