Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms
Alejandro H. Artiles ⋅ Martin Weiss ⋅ Levin Brinkmann ⋅ Anirudh Goyal ⋅ Nasim Rahaman
Abstract
Large language models are adept at synthesizing and recombining familiar material, yet they often fail at a specific kind of creativity that matters most in research: producing ideas that are both \emph{coherent} and \emph{non-obvious} to the current community. We formalize this gap through \emph{cognitive availability}, the likelihood that a research direction would be naturally proposed by a typical researcher given what they have worked on. We introduce a pipeline that (i) decomposes papers into granular conceptual units, (ii) clusters recurring units into a shared vocabulary of \emph{idea atoms}, and (iii) learns two complementary models: a \emph{coherence} model that scores whether a set of atoms constitutes a viable direction, and an \emph{availability} model that scores how likely that direction is to be generated by researchers drawn from the community. We then sample ``alien'' directions that score high on coherence but low on availability. On a corpus of $\sim$7,500 recent LLM papers from NeurIPS, ICLR and ICML, we validate that (a) conceptual units preserve paper content under reconstruction, (b) idea atoms generalize across papers rather than memorizing paper-specific phrasing, and (c) the Alien sampler produces research directions that are more diverse than LLM baselines while maintaining coherence.
Chat is not available.
Successful Page Load