Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning for Genomics Explorations (MLGenX)

Unveiling Zero Shot Prediction for Gene Attributes Through Interpretable AI

Ala Jararweh · Oladimeji Macaulay · David Arredondo · Olufunmilola Oyebamiji · Luis Tafoya · Kushal Virupakshappa · Avinash Sahu


Abstract:

Representation learning has transformed the prediction of structures and functions of genes and proteins by employing sequence, expression, and network data. Yet, this approach taps into just a fraction of the knowledge accumulated over more than a century of genetic research. Here, we introduce GeneLLM, an interpretable transformer-based model that integrates textual information through contrastive learning to refine gene representations. While it has been posited that such knowledge representation could result in a bias towards well-characterized genes, GeneLLM surprisingly shows high accuracy across eight gene-related benchmarks, not only matching but often outperforming task-specific models, with a 50\% increase in accuracy over its closest solubility-specific competitor. It demonstrates robust zero-shot learning capabilities for unseen gene annotations. The model's interpretability and our multimodal strategic approach to mitigating inherent data biases bolster its utility and reliability, particularly in biomedical applications where interpretability is paramount. Our findings affirm the complementary nature of unstructured text to structured databases in enhancing biomedical predictions, while conscientiously addressing interpretability and bias for AI deployment in healthcare. The code and datasets can be found at https://www.avisahuai.com/tools on request

Chat is not available.