IN SILICO GENERATIVE DESIGN OF CHEMICALLY MODIFIED RNA SEQUENCES FOR FUNCTIONAL PREDICTION
Abstract
RNA chemical modifications play a central role in regulating RNA stability, translation, and function. While generative machine learning has been widely applied to canonical biomolecules, generative design of chemically modified RNAs remains largely unexplored. We present a fully in silico framework for conditional generation of RNA sequences with specified epitranscriptomic modifications. Using 987,654 modification sites from RMBase and MODOMICS, we train a conditional variational autoencoder (cVAE) with 32D latent space to model RNA sequence context conditioned on modification type. The model generates diverse (94.3% unique), novel (novelty score: 0.78) sequences while preserving known motifs (similarity: 0.87) and thermodynamic plausibility (ΔMFE: 0.8 kcal/mol, p=0.12). Generated sequences exhibit modification-specific patterns with 92.5% conditional accuracy. Our results demonstrate that generative models can explore underrepresented regions of epitranscriptomic sequence space without experimental data, providing a computational foundation for future RNA modification design.