What is it about?

This paper explores in detail the issue of duplication in biological sequence databases, providing a categorization of different kinds of duplication, and demonstrating via a case study the practical impact that unrecognized duplicates might have on tasks that make use of these databases. In brief, it explores one aspect of data quality in these databases.

Featured Image

Why is it important?

Biological sequence databases are heavily used to support biological research and data analysis. It is important to understand the sorts of quality issues that occur in these resources. With this understanding, we hope to develop strategies for making more effective use of the databases, and mitigating any negative impacts.

Read the Original

This page is a summary of: Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study, Database, January 2017, Oxford University Press (OUP),
DOI: 10.1093/database/baw163.
You can read the full text:

Read

Contributors

The following have contributed to this page