What is it about?

Biobanks are collections of human biological samples (like blood, tissue, or DNA) paired with health, lifestyle, and demographic information. They serve as vital resources for medical research, helping scientists understand diseases and develop new treatments. Despite their importance, we've lacked a comprehensive way to measure their true impact and value until now. Our research team used computational methods to identify 2,663 biobanks worldwide and analyzed their influence across scientific publications, grants, patents, clinical trials, and public policy documents. We discovered that biobank research focuses heavily on just a few disease areas. Seven out of ten biobanks specialize in general health, nervous system conditions, urogenital issues, cancer, infections, or cardiovascular diseases. Within these categories, obesity, Alzheimer's disease, breast cancer, and diabetes receive the most attention. Moreover, we found that a biobank's impact is shaped more by scientific collaboration than by traditional citation metrics. Surprisingly, 41.1% of scientific papers that use biobank resources fail to cite the biobank's reference papers properly. However, 59.6% of papers include a biobank team member as a co-author, suggesting that collaboration rather than citation is how biobanks gain recognition. Finally, our analysis revealed that high-impact biobanks share specific characteristics: they're more open to external researchers, they offer quality data (especially linked medical records), they provide advanced genetic data like whole-genome sequencing, and highly cited researchers lead them. Interestingly, contrary to popular belief, having a large sample size isn't necessarily associated with greater impact. Understanding what makes biobanks successful can help guide future investments and improve how these valuable resources are developed and used. Our findings suggest that prioritizing data quality, accessibility, and collaboration may be more important than simply collecting larger samples. We've made our findings available through an open-access web application that allows users to search, explore, and compare biobanks, providing a valuable tool for researchers, funders, and policymakers seeking to maximize the impact of these essential resources.

Featured Image

Why is it important?

Biobanks require significant financial resources and infrastructure to establish and maintain, so identifying the factors that contribute to their impact is crucial for ensuring these investments yield maximum scientific and public health benefits. Our research proves that traditional metrics like citation counts severely underestimate biobanks' true scientific impact, with over 40% of papers using biobank data failing to properly cite these resources. This "hidden citation" phenomenon highlights the need for better attribution practices and alternative impact measurements. Additionally, our findings suggest that the common emphasis on amassing large sample sizes may be misplaced; instead, focusing on data quality, particularly linked medical records, and improving accessibility to external researchers may yield greater returns. By understanding these dynamics, funding agencies can better allocate resources, biobank operators can implement more effective governance policies, and researchers can make more informed choices about which biobanks to utilize. Ultimately, optimizing biobank impact has far-reaching implications for accelerating medical discoveries, improving disease understanding, and developing new treatments, particularly for conditions like obesity, Alzheimer's disease, and diabetes that are currently receiving the most biobank-based research attention.

Perspectives

Biobanks are the unsung heroes of biomedical research, contributing immensely to our understanding of diseases from cancer to Alzheimer's. Yet, they often lack the proper credit channels that other scientific institutions enjoy. As this paper reveals, over 40% of publications utilizing biobank data fail to properly cite these resources, creating a significant "hidden citation" problem that undervalues their true impact. This systematic undercitation means that traditional metrics fail to capture biobanks' vital role in accelerating medical discoveries and advancing public health initiatives. This study is particularly valuable because we've primarily relied on survey data and anecdotal evidence to understand what drives biobank usage and credit allocation. This comprehensive computational analysis of over 2,600 biobanks across scientific publications, grants, patents, clinical trials, and policy documents provides the first data-driven assessment of their actual impact. By identifying specific features associated with high-impact biobanks—such as open data policies and quality of linked medical records rather than simply sample size—this research offers actionable insights for biobank operators, funding agencies, and researchers. This evidence-based approach to evaluating biobank contributions significantly advances over previous subjective assessments. It provides a framework for maximizing the scientific and societal returns on these important investments.

Rodrigo Dorantes Gilardi
Northeastern University

Read the Original

This page is a summary of: Quantifying the impact of biobanks and cohort studies, Proceedings of the National Academy of Sciences, April 2025, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2427157122.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page