What is it about?
Click-through rate~(CTR) prediction holds a pivotal role in online advertising and recommendation systems, where even small improvements can significantly boost revenue. Existing research primarily focuses on designing dual-stream architectures to capture effective complex feature interactions from both explicit and implicit perspectives. However, these approaches are faced with two major challenges: 1) the high complexity of feature interaction learning, which increases computational demands and the overfitting risk, and 2) the imbalance between explicit and implicit modules, where one module's output may dominate the final prediction. To address these issues, in this paper, we propose \textbf{D}ual-\textbf{S}tream \textbf{MLP}~(\textbf{DS-MLP}), a novel feature interaction framework for the CTR prediction task. Specially, it leverages knowledge distillation to consolidate the capacity of learning explicit feature interaction into a main MLP network, while a parallel MLP simultaneously captures implicit feature interactions as a complement. To effectively optimize the dual-stream MLP architecture, we further design a specific learning approach with two alignment strategies for enhancing the compatibility of the two MLP components. Experiments demonstrate that \textbf{DS-MLP}, though merely a vanilla MLP structure (the final model), can achieve state-of-the-art performance across three widely used benchmarks, offering a scalable and efficient solution for large-scale recommendation systems. Our code is available at \href{https://github.com/RUCAIBox/DS-MLP}{\textcolor{teal}{https://github.com/RUCAIBox/DS-MLP}}.
Featured Image
Photo by Walls.io on Unsplash
Why is it important?
Most CTR prediction models rely on heavy dual-stream architectures that explicitly enumerate high-order feature interactions—leading to high computational costs, overfitting risks, and imbalanced fusion between explicit and implicit modules. Our work, DS-MLP, breaks away from this trend by showing that a vanilla MLP with very few layers can achieve state-of-the-art performance when trained with a carefully designed distillation-and-alignment procedure. The uniqueness lies in our insight: instead of making the model more complex, we transfer the capability of learning explicit interactions into a main MLP via knowledge distillation, while a parallel MLP complements implicit interactions. Two lightweight alignment strategies (batch normalization + direct task supervision) harmonize the two streams without introducing expensive operations. What makes this timely is the growing demand for scalable, low-latency solutions in industrial recommender systems. As models become deeper and wider, deploying them on resource-constrained platforms (e.g., real-time ad servers) becomes challenging. DS-MLP offers a simple, efficient, yet capable alternative—proving that “less can be more” when optimization is guided by the right principles. Our open-source code further lowers the barrier for adoption. This work can help shift the community’s focus from architecturally complex designs to smarter training strategies, potentially reducing the carbon footprint of large-scale recommendation systems while maintaining or even improving accuracy
Perspectives
Writing this paper was both a challenging and rewarding journey. For a long time, I’ve been fascinated by the question: Why do simple models often fail, and can we make them succeed without adding complexity? DS-MLP is my answer. Seeing a plain MLP—often dismissed as “too simple” for CTR prediction—consistently outperform sophisticated models like DeepFM and xDeepFM across three benchmarks was truly exhilarating. What I hope readers take away from this work is not just a new model, but a shift in mindset. You don’t always need to invent another complicated module or stack more cross layers. Sometimes, the right learning recipe (distillation → alignment → optimization) can unlock hidden potential in the simplest architectures. I also hope this paper encourages researchers and practitioners to look more closely at imbalanced fusion—a subtle but critical issue that many prior works overlook. Finally, I’m deeply grateful to my collaborators. I’m excited to see how DS-MLP evolves in real-world deployments. If this work makes even one person rethink their go-to complex model, I’ll consider it a success.
Kesha Ou
Renmin University of China
Read the Original
This page is a summary of: Dual-Stream MLP is All You Need for CTR Prediction, ACM Transactions on Knowledge Discovery from Data, May 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3819238.
You can read the full text:
Contributors
The following have contributed to this page







