Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design
GROQ-seq: A Collaborative, Open Data Approach to Addressing Protein Function Prediction
David Ross · Han Spinner · Simon dOelsnitz · Svetlana Ikonomova · Olga Vasilyeva · Nina Alperovich · Kristen Sheldon · Courtney Tretheway · Dana Cortade · Erika DeBenedictis · Peter Kelly
We have developed an experimental platform and unified data ontology for collecting and sharing open-access data on diverse protein functions to enable the development of predictive models linking sequence to function. The experimental strategy for data generation employs the growth-based quantitative sequencing (GROQ-seq) platform as a simple yet adaptable system that can be easily expanded to encompass new functions. This high-throughput experimental platform can produce quantitative functional characterization data for hundreds of thousands of proteins per experiment at a cost of approximately $0.05 per sequence. To date, we have made significant progress in collecting data for our initial protein function: transcription factor binding. We are also developing GROQ-seq for a suite of additional protein functions, including proteases, aminoacyl tRNA synthetases, RNA polymerases, histidine kinases, single-chain antibody fragments, and a variety of metabolic enzymes. Being both highly-scalable and extensible to new protein functions, the GROQ-seq platform enables the collection of the diversity of data necessary to create a generalizable model that quantitatively predicts sequence to function relationships.