What is it about?

Herbarium collections are invaluable resources for studying plant diversity and evolution, with specimens often serving as a record of past and present plant distributions. To fully leverage these collections, it is necessary to develop tools that can automatically process and enrich digitized specimens, making it easier for researchers to access and analyze the data. While models that work well with single specimens are available, there is a need to develop models that can accurately extract multiple specimens from the same image. This paper addresses this challenge by experimenting with different deep learning models to identify the best approach to localize plant specimens in more complex herbarium sheets. We found that segmentation models outperformed detection models and achieved promising results for multi-specimen extraction. The main bottleneck in this research was the lack of labeled data, which is essential for training and evaluating deep learning models. To address this issue, methods were developed to semi-automatically generate specimen annotations based on color segmentation. These annotations were then combined via a copy-paste augmentation method, which improved the model's accuracy.

Featured Image

Why is it important?

This research provides an essential step towards making herbarium collections more accessible for research and analysis. The automated localization and extraction of plant specimens from herbaria sheets enable researchers to analyze and compare specimens on a larger scale, which can help to advance our understanding of plant diversity. It expands upon previous work and applies the developed techniques to complex herbaria sheets featuring multiple specimens. Additionally, methods were developed to semi-automatically generate annotations for herbarium images, significantly reducing the manual annotation efforts needed to train the deep learning models.


It was a great pleasure to write this article with my co-authors, as it demonstrates our ongoing research on applying computer vision methods to herbarium collections. Many problems still remain in processing and analyzing natural science and archive collections at scale. Automated methods make these collections more accessible and reduce manual annotation efforts, opening up new research opportunities.

Kenzo Milleville
Universiteit Gent

Read the Original

This page is a summary of: Automatic extraction of specimens from multi specimen herbaria, Journal on Computing and Cultural Heritage, March 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3575862.
You can read the full text:



The following have contributed to this page