GSHAPA: Gene Set Analysis for Single-Cell RNAseq Using Random Forest and SHAP Values

Sara Khademioureh; Irina Dinu; Sergio Peignier

doi:10.1145/3672608.3707901

What is it about?

GSHAPA is a new machine learning tool that helps scientists find which groups of genes are linked to diseases by analyzing single-cell data. It works better than older methods because it can detect complex relationships between genes, analyze individual patient samples, and delivers more accurate results with fewer false leads. This helps researchers better understand how diseases work at the cellular level and could eventually lead to more personalized treatments.

Photo by Sangharsh Lohakare on Unsplash

Why is it important?

GSHAPA advances gene set analysis for single-cell RNA data by providing exceptional individualized results at the patient level. While building on existing approaches, our method uniquely combines Random Forest models with SHAP values to capture complex gene interactions with unprecedented precision. The timing is crucial as single-cell sequencing has revolutionized genomics, but analysis tools struggle with its complexity. Our approach is 5x faster than current methods while delivering more accurate results with fewer false positives. This efficiency accelerates research by helping scientists identify disease-specific pathways more reliably. Most significantly, GSHAPA's demonstrated success in patient-specific analysis opens new possibilities for precision medicine. By revealing how gene pathways differ between individual patients across multiple diseases (including diabetes, COVID-19, and Alzheimer's), it enables researchers to identify personalized molecular signatures that could lead to more targeted treatments. For researchers navigating increasingly complex genomic data, this represents a significant improvement in extracting meaningful, actionable insights.

Perspectives

As a statistician who has dedicated my PhD to gene expression analysis applications and evaluations, I've intimately experienced the complexities and limitations of traditional statistical methods, especially when applied to single-cell data. The extraordinary dimensionality and sparsity of scRNA-seq data consistently challenge conventional approaches, often leading to false discoveries or missed biological signals. Working with our team on GSHAPA has been especially rewarding because it addresses these fundamental limitations I've struggled with throughout my academic journey. Watching combining statistics with machine learning techniques capture the relationships between genes that traditional methods miss has been a particularly satisfying resource-saving. What excites me most is GSHAPA's ability to analyze individual patient samples. During our collaborative research, seeing unique pathway signatures emerge from each patient's data was truly eye-opening. This personalized aspect feels like a glimpse into the future of medicine, where treatments might be tailored to each person's unique cellular landscape. The computational efficiency was another surprising outcome. While we expected improvements, achieving results five times faster than established methods exceeded our expectations and makes the approach practical for researchers with limited computational resources. This is just the beginning. Our team is actively working to enhance GSHAPA with more customized options for different scenarios and data types. I'm excited for researchers to experience these improvements in upcoming versions as we continue refining our approach to make single-cell analysis more accessible and insightful for the scientific community.
Sara Khademioureh
University of Alberta

This page is a summary of: GSHAPA: Gene Set Analysis for Single-Cell RNAseq Using Random Forest and SHAP Values, March 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3672608.3707901.
You can read the full text:

Read

Contributors

The following have contributed to this page

Sara Khademioureh
University of Alberta

Finding Disease-Related Gene Groups: A Smarter Approach to Analyze Single-Cell Data

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Finding Disease-Related Gene Groups: A Smarter Approach to Analyze Single-Cell Data

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management