ICLR Poster Few-Shot Learning via Learning the Representation, Provably

Poster

Few-Shot Learning via Learning the Representation, Provably

Simon Du · Wei Hu · Sham M Kakade · Jason Lee · Qi Lei

Virtual

Keywords: [ representation learning ] [ statistical learning theory ]

[ Abstract ] [ Paper PDF ]

[ Slides]

[ Paper ]

Abstract: This paper studies few-shot learning via representation learning, where one uses

T

$T$ source tasks with

n_{1}

$n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task for which there is only

n_{2} (≪ n_{1})

$n_2 (\ll n_1)$ data. Specifically, we focus on the setting where there exists a good common representation between source and target, and our goal is to understand how much a sample size reduction is possible. First, we study the setting where this common representation is low-dimensional and provide a risk bound of

~ O (\frac{d k}{n_{1} T} + \frac{k}{n_{2}})

$\tilde{O}(\frac{dk}{n_1T} + \frac{k}{n_2})$ on the target task for the linear representation class; here

d

$d$ is the ambient input dimension and

k (≪ d)

$k (\ll d)$ is the dimension of the representation. This result bypasses the

Ω (\frac{1}{T})

$\Omega(\frac{1}{T})$ barrier under the i.i.d. task assumption, and can capture the desired property that all

n_{1} T

$n_1T$ samples from source tasks can be \emph{pooled} together for representation learning. We further extend this result to handle a general representation function class and obtain a similar result. Next, we consider the setting where the common representation may be high-dimensional but is capacity-constrained (say in norm); here, we again demonstrate the advantage of representation learning in both high-dimensional linear regression and neural networks, and show that representation learning can fully utilize all

n_{1} T

$n_1T$ samples from source tasks.

Chat is not available.