What is it about?

We propose a modification that corrects for split-improvement variable importance measures in random forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.

Featured Image

Why is it important?

Machine learning models have been ubiquitous in every day applications. Practitioners often reply on feature importance measurement to understand model behavior. However, a widely used feature importance metric for tree-based methods are inherently biased. In this paper we analyze this phenomenon and propose a simple yet effective correction.

Read the Original

This page is a summary of: Unbiased Measurement of Feature Importance in Tree-Based Methods, ACM Transactions on Knowledge Discovery from Data, April 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3429445.
You can read the full text:



The following have contributed to this page