Oral
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Prompting a Pretrained Transformer Can Be a Universal Approximator
Aleksandar Petrov · Adel Bibi · Philip Torr
Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited.A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it, i.e., whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions.This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed.In fact, prefix-tuning a single attention head being sufficient to approximate any continuous function.Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length.Beyond these density-type results, we also offer bounds on the length of the prefix needed to approximate a function to a desired precision.