This is just one example of how generative AI enables programmable medicine, and we’re thrilled about its potential. The combination of powerful large language models and the industry developing high volumes of high-throughput genomics data means we’re on the cusp of major breakthroughs.
Ron Hause, PhD, Senior Vice President, Head of AI at ShapeTX
There’s a lot of buzz in the world of deep learning thanks to the emergence of generative AI. You may have heard of DALL-E 2, a generative model that creates new images from text prompts based on vast amounts of annotated data. But the technology that powers tools like DALL-E 2 and ChatGPT has other applications that are less well-known.
At ShapeTX, we’re harnessing the power of generative AI to create novel medicines that repair the genetic causes of diseases. Alongside advancements in RNA technology, generative AI will make it possible to program medicines for diseases that were previously seen as “untreatable.”
Here’s what that looks like.
Our RNA editing technology, RNAfix®, delivers DNA-encoded guide RNAs (gRNAs) to recruit a protein called adenosine deaminase acting on RNA (ADAR) to precisely correct a genetic mutation causing a disease. By screening millions of gRNAs, we can learn the patterns and structures in RNA that specifically hone ADAR to edit a single site.
Then, we use our gRNA editing data to train generative deep learning models, treating RNA strands as matrices to design novel RNA “images.” This allows us to explore a larger landscape of RNA sequence and structural possibilities than ever before.
Creating novel therapeutic gRNAs
Unlike DALL-E—which generates fantastical images that a camera couldn’t capture—our work is constrained by biological plausibility. We also need to focus on generating gRNAs that achieve highly efficient and specific RNA editing.
We do this through a process called conditioning, which teaches the model what qualities make for efficient and targeted gRNAs. We feed “images” of various gRNAs into the model along with information about how well they worked when we tested them.
With this data, the model can generate completely new gRNAs that it has never seen before. This happens through a process in which the model randomly adds noise to gRNA “images” until they are unrecognizable. Think of this like adding static to a high-res photo of a person until you can no longer see their face. This step is called forward diffusion.
Then, our model learns the reverse diffusion process: removing the noise from the gRNA “images.” To generate novel gRNAs, we simply run the reverse diffusion process on randomly generated “images,” like refining fuzzy photographs of people until we can see their faces. The model generates gRNAs with high on-target editing and high specificity, because the conditioning information it’s seen taught it how those variables correlate with gRNAs that “look” a certain way.
This video shows denoising for a candidate gRNA. This gRNA targets the LRRK2*G2019S mutation that causes Parkinson’s disease in a subset of patients with the condition.
DALL-E uses conditioning and diffusion to produce novel images, like a white unicorn with big eyes riding a motorcycle. The model has learned about each of those characteristics: unicorn, white, big eyes, and motorcycle. That means it can start with noise and produce a brand new image with those characteristics.
Once our model has generated a gRNA that it predicts can edit the genetic mutation with exceptional efficiency and specificity, we test how accurate its prediction was in the lab – in other words, how effective the gRNA is at recruiting ADAR that corrects that mutation. We can then add those lab validation results back into the model so it can continuously learn and iterate.
This is just one example of how generative AI enables programmable medicine, and we’re thrilled about its potential. The combination of powerful large language models and the industry developing high volumes of high-throughput genomics data means we’re on the cusp of major breakthroughs.