Poster
in
Workshop: 5th Workshop on practical ML for limited/low resource settings (PML4LRS) @ ICLR 2024
Autoregressive activity prediction for low-data drug discovery
Johannes Schimunek · Lukas Friedrich · Daniel Kuhn · Günter Klambauer
Autoregressive modeling is the main learning paradigm behind the currently sosuccessful large language models (LLM). For sequential tasks, such as generating natural language, autoregressive modeling is a natural choice: the sequence isgenerated by continuously appending the next sequence token. In this work, weinvestigate whether the autoregressive modeling paradigm could also be successfully used for molecular activity and property prediction models, which are equivalent to LLMs in molecular sciences. To this end, we formulate autoregressiveactivity prediction modeling (AR-APM), draw relations to transductive and activelearning, and assess the predictive quality of AR-APM models in few-shot learningscenarios. Our experiments show that using an existing few-shot learning systemwithout any other changes, except switching to autoregressive mode for inference,improves ∆AUC-PR up to ∼40%. Code is available here: https://github.com/ml-jku/autoregressiveactivityprediction.