What is it about?

Our AI system is given an amino acid sequence and its goal is to predict which of the (exponentially) many possible codon sequences is actually used by the organism. "Codon-bias" is long-known: some codons are used more often, yet which codons varies among organisms. The most straight-forward predictions are the most frequent codons, or possibly use codon-pair frequencies to optimize predictions. What is cool about using a generative AI model is that we can predict more accurately than these frequency-based predictions. This is expected: AI models learn complex codon patterns. Because our predictions are tested fairly (separate train/test sets), accurate predictions suggest that the learned patterns are not random. We believe they are related to the evolutionary process.

Featured Image

Why is it important?

Generating codon sequences for proteins is an important task in biotechnology for the production of non-host proteins, in bacterial cell factories, as vaccines, or for agricultural purposes. We think that in this field too, AI is going to be a game changer.

Read the Original

This page is a summary of: Predicting gene sequences with AI to study codon usage patterns, Proceedings of the National Academy of Sciences, December 2024, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2410003121.
You can read the full text:

Read

Contributors

The following have contributed to this page