DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles

  • Li Chen, Peng Jin, Zhaohui S. Qin
  • December 2016, Springer Science + Business Media
  • DOI: 10.1186/s13059-016-1112-z

Disease-specific non-coding variant annotation

What is it about?

DIVAN is a machine learning-based algorithm that is capable of predicting whether a mutation that occur anywhere in the genome is likely to be disease-associated. It is related to popular algorithms such as GWAVA, CADD, Eigen and GenomeCanyon. But a big difference is that DIVAN is disease-specific. it will make different predictions for the same mutation for different diseases or traits.

Why is it important?

90% of the disease-associated variants found by GWAS is non-coding. How to annotate non-coding variants is important yet challenging. In recent years, popular tools including GWAVA, CADD, Eigen, GenomeCanyon have been developed to solve this problem. Although these methods predict whether a mutation is likely to be "risky" or neutral. However, it is likely that a particular variant is only associated with one particular disease. We believe a disease-specific annotation of non-coding variants is much more important but has yet to receive much attention so far. Another secondary, but very surprising finding is that the most important feature for distinguishing risk and benign variants, is not the enrichment of open chromatin marks as many have previous noticed and reported , but the depletion of close chromatin marks around the risk variants, especially H3K9me3.

Perspectives

Dr Zhaohui S. Qin
Emory University

Even with the entire human population, GWAS is not going to identify all the disease-associated variants for any disease due to constraints and complications such as MAF and LD. DIVAN provides a potential solution to this problem. It is like a bridge that connectc population-based studies like GWAS with molecular profiles like histone modification and DNA methylation. In this sense, methods like DIVAN can be a very useful tool to interpret personal genome sequencing data where rare and non-coding variants are the most frequent findings.

Read Publication

http://dx.doi.org/10.1186/s13059-016-1112-z

The following have contributed to this page: Dr Zhaohui S. Qin