What is it about?
This paper looks at how recommender systems (like those used for shopping, videos, or courses) can treat items more fairly without hurting accuracy. Today’s systems tend to over-promote already-popular items, while niche “long-tail” items are rarely shown. Existing “fair” methods usually use a fixed rule and are evaluated in static, one-shot settings. The authors first show that people’s preferences for popular vs. long-tail items vary a lot across users and also change over time (“spatiotemporal heterogeneity”). Building on this, they propose HER4IF, a hierarchical reinforcement learning framework. A high-level agent dynamically decides how much exposure popular vs. long-tail items should get for each user and time, while a low-level agent recommends concrete items under this constraint. A time-forgetting model and contrastive learning are used to better track user popularity preference and fight data sparsity. Experiments on three public datasets and the KuaiSim simulator show that HER4IF can simultaneously improve both fairness and accuracy.
Featured Image
Photo by Sasun Bughdaryan on Unsplash
Why is it important?
Recommendation platforms strongly shape what users see and what creators earn. If algorithms keep amplifying already-popular items, exposure becomes highly skewed, reinforcing “the rich get richer” dynamics and hurting ecosystem diversity. Prior work treats fairness as a trade-off: you typically sacrifice accuracy to help long-tail items. This paper argues and demonstrates that such a trade-off is not inevitable. By explicitly modeling how each user’s tolerance for popularity changes across users and over time, the system can decide when a user is ready to accept more long-tail content without harming engagement. HER4IF operationalizes this via hierarchical RL and dynamic fairness constraints, showing consistent gains in both exposure fairness and click-based metrics across datasets and in an interactive simulator. Practically, this offers platform designers a principled way to improve long-term ecosystem health—supporting small or new items—while still meeting short-term business KPIs. Conceptually, it reframes fairness in recommendation as a sequential, user-adaptive control problem rather than a static regularizer.
Perspectives
Future work could extend HER4IF beyond popularity to incorporate item quality, creator fairness, and user-side group fairness. The hierarchical RL framework also invites richer high-level goals (e.g., diversity, serendipity, or creator churn) and more realistic user models in simulators. Finally, integrating causal modeling with dynamic fairness control may further clarify when and how fairness interventions truly help users and items over the long run.
Chongjun Xia
Read the Original
This page is a summary of: Beyond Trade-offs: Leveraging Spatiotemporal Heterogeneity of User Preference for Long-term Fairness and Accuracy in Interactive Recommendation, ACM Transactions on the Web, September 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3769471.
You can read the full text:
Contributors
The following have contributed to this page







