ICLR Poster Matrix Product Sketching via Coordinated Sampling

Poster

Matrix Product Sketching via Coordinated Sampling

Majid Daliri · Juliana Freire · Danrong Li · Christopher Musco

Hall 3 + Hall 2B #457

[ Abstract ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: We revisit the well-studied problem of approximating a matrix product,

$\bv{A}^T\bv{B}$ , based on small space sketches

$\mathcal{S}(\bv{A})$ and

$\mathcal{S}(\bv{B})$ of

$\bv{A} \in \R^{n \times d}$ and

$\bv{B}\in \R^{n \times m}$ . We are interested in the setting where the sketches must be computed independently of each other, except for the use of a shared random seed. We prove that, when

$\bv{A}$ and

$\bv{B}$ are sparse, methods based on \emph{coordinated random sampling} can outperform classical linear sketching approaches, like Johnson-Lindenstrauss Projection or CountSketch. For example, to obtain Frobenius norm error

$\epsilon\|\bv{A}\|_F\|\bv{B}\|_F$ , coordinated sampling requires sketches of size

$O(s/\epsilon^2)$ when

$\bv{A}$ and

$\bv{B}$ have at most

$s \leq d,m$ non-zeros per row. In contrast, linear sketching leads to sketches of size

$O(d/\epsilon^2)$ and

$O(m/\epsilon^2)$ for

$\bv{A}$ and

$\bv{B}$ . We empirically evaluate our approach on two applications: 1) distributed linear regression in databases, a problem motivated by tasks like dataset discovery and augmentation, and 2) approximating attention matrices in transformer-based language models. In both cases, our sampling algorithms yield an order of magnitude improvement over linear sketching.

Live content is unavailable. Log in and register to view live content