What is it about?

Sometimes a researcher wants to select a set of variables that are highly predictive of an outcome of interest. There are two machine learning algorithms that perform this procedure; however, they only operate on complete data sets. There is no clear guidance on what to do when there is missing data in the data set. This tutorial describes three ways to use a machine learning algorithm to select a set of predictors when using multiple imputation to handle missing data.

Featured Image

Why is it important?

This is a tutorial that can serve as a central resource for methodologists who are interested in expanding on the methods we discuss and for applied researchers so they know how to implement the methods and the limitations of those methods.


I learned so much from writing this tutorial. I'm incredibly proud of the finished product. I hope others find it useful and I have made researchers' lives a bit easier by laying out step by step how to perform these methods.

Heather Gunn

Read the Original

This page is a summary of: How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion., Psychological Methods, February 2022, American Psychological Association (APA),
DOI: 10.1037/met0000478.
You can read the full text:




The following have contributed to this page