Universal Features for the Classification of Coding and Non-coding DNA Sequences

Nicolas Carels; Ramon Vidal; Diego Frías

doi:10.4137/bbi.s2236

What is it about?

In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of the sequence sample considered was large and suggested that these features are universal. The features that were investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets.

Why is it important?

These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding. The method does not need any training step.

Perspectives

It is expected that the method described will contribute for coding frame detection in uncharacterized DNA sequences such as it is the case in shotgun metagenomics, for instance.
Nicolas Carels
Oswaldo Cruz Foundation

This page is a summary of: Universal Features for the Classification of Coding and Non-coding DNA Sequences, Bioinformatics and Biology Insights, January 2009, SAGE Publications,
DOI: 10.4137/bbi.s2236.
You can read the full text:

Read

Resources

URL
Bioinformatics and Biology Insights
Original report

Contributors

The following have contributed to this page

Nicolas Carels
Oswaldo Cruz Foundation

Universal features for the classification of coding and non-coding sequences

What is it about?

Why is it important?

Perspectives

Resources

Bioinformatics and Biology Insights

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Universal features for the classification of coding and non-coding sequences

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Bioinformatics and Biology Insights

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management