What is it about?
The annotation of protein-coding genes in assembled genomes is essential for functional genomics and biotechnological applications. Particularly, precise gene annotation is a prerequisite for off-target validation in genome editing which has been frequently performed in current biological studies. However, the majority of published annotations omitted important genes which correctly exist in assembled genomes and has been generating biased studies. To address this issue, we developed Target Gene Family Finder (TGFam-Finder), a tool for the structural annotation of protein-coding genes containing target domain(s) of interest in assembled genomes. We extensively validated the accuracy of the annotation and performed efficiency testing on TGFam-Finder and compared our findings to those obtained using Maker2 and GeMoMa, the current conventional annotation tools. We annotated far-red-impaired response 1 (FAR1), nucleotide-binding and leucine-rich-repeat (NLR), and cytochrome P450 (CYP450) genes in Arabidopsis, rice and maize genomes using 11 different parameters (1, 5 and 5 parameters for TGFam-Finder, GeMoMa and Maker2, respectively). We extensively assessed those tools by comparing i) 11 combined gene models grouped in different sets of trials with distinct parameters, ii) 33 combined models grouped by families or species, and iii) 99 individual gene models for each family in each species to evaluate all combinations from those tools. Our results revealed the outstanding performance of TGFam-Finder based on a higher annotation coverage and strikingly reduced runtime compared to that of those existing annotation tools. Large-scale re-annotation using TGFam-Finder revealed that an average of 150, 166, and 86 additional far-red-impaired response 1 transcription factor, nucleotide-binding and leucine-rich-repeat, and cytochrome P450 genes, respectively, were newly identified in 50 plant genomes. Using mass spectrometry data of seven plant species (rice, bean, barley, pepper, grape, apple, and eucalyptus), we detected significantly higher number of translated genes in the new annotation (21%, 991 of 4,764) compared to the previous annotation (10%, 795 of 7,591). Our approach could provide an optimal solution for the identification and characterization of target gene families and accelerate functional, comparative and evolutionary studies in plants.
Featured Image
Read the Original
This page is a summary of: TGFam‐Finder: a novel solution for target‐gene family annotation in plants, New Phytologist, June 2020, Wiley,
DOI: 10.1111/nph.16645.
You can read the full text:
Contributors
Be the first to contribute to this page