What is it about?

Aligning and integrating different datasets is a key challenge in single-cell research. But which data should or shouldn’t be aligned? Our paper proposes a rigorous test using random matrix theory to tell when a pair of datasets are (partially) alignable, and a new spectral alignment algorithm for high-dimensional data with minimal distortion.

Featured Image

Why is it important?

Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations to integrate heterogeneous single-cell datasets. Despite their wide usage, however, existing methods suffer from several fundamental and under-appreciated limitations. First, we do not have a rigorous statistical test for determining whether two single-cell datasets should or shouldn’t even be integrated. Moreover, popular methods often substantially distort the biological signals during data alignment, making the downstream analysis subject to bias and difficult to interpret. We address both challenges with a unified spectral manifold alignment and inference (SMAI) framework based on recent advances in random matrix theory and high-dimensional statistics, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. The method preserves within-data structures and improves many downstream analyses, such as identification of differentially expressed genes and imputation of spatial transcriptomics.

Read the Original

This page is a summary of: Principled and interpretable alignability testing and integration of single-cell data, Proceedings of the National Academy of Sciences, February 2024, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2313719121.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page