ICLR Aligners: Decoupling LLMs and Alignment

Poster
in
Affinity Workshop: Tiny Papers Poster Session 2

Aligners: Decoupling LLMs and Alignment

Lilian Ngweta · Mayank Agarwal · Subha Maity · Alex Gittens · Yuekai Sun · Mikhail Yurochkin

Halle B #346

[ Abstract ] [ Project Page ]

[ OpenReview]

Tue 7 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Large Language Models (LLMs) need to be aligned with human expectations to ensure their safety and utility in most applications. Alignment is challenging, costly, and needs to be repeated for every LLM and alignment criterion. We propose to decouple LLMs and alignment by training aligner models that can be used to align any LLM for a given criteria on an as-needed basis, thus also reducing the potential negative impacts of alignment on performance. Our recipe for training the aligner models solely relies on synthetic data generated with a (prompted) LLM and can be easily adjusted for a variety of alignment criteria. We illustrate our method by training an ``ethical'' aligner and verify its efficacy empirically.

Chat is not available.

Poster in Affinity Workshop: Tiny Papers Poster Session 2

Aligners: Decoupling LLMs and Alignment

Lilian Ngweta · Mayank Agarwal · Subha Maity · Alex Gittens · Yuekai Sun · Mikhail Yurochkin

Halle B #346

Poster
in
Affinity Workshop: Tiny Papers Poster Session 2