What is it about?

Modern applications like real-time control and low-power sensing need machine-learning models that run extremely fast with minimal hardware. This paper introduces KANELÉ, a workflow that turns Kolmogorov–Arnold Networks (KANs) into FPGA designs built almost entirely from lookup tables (LUTs) and additions. Instead of relying on heavy arithmetic, KANs use many learnable one-dimensional functions; KANELÉ discretizes these functions so each one becomes a small LUT that the FPGA can evaluate efficiently.

Featured Image

Why is it important?

1. Makes KANs practical on hardware: Prior work suggested KANs were too expensive on FPGAs due to spline evaluation costs; this paper shows that if you reformulate inference directly as LUT lookups, KANs can be both feasible and highly efficient. 2. Major speed and footprint gains: The framework reports up to ~2700× latency speedup and >4000× resource reduction versus an earlier KAN-on-FPGA approach, while eliminating the need for DSPs/BRAM in several implementations. 3. Competitive with state-of-the-art LUT-NNs: On established LUT-based FPGA ML benchmarks (e.g., jet tagging and MNIST), KANELÉ is positioned as a strong accuracy–efficiency trade-off and often sits on the Pareto frontier for area×delay. 4. Broader than classification: The paper also demonstrates a route to real-time control (reinforcement learning policy inference) with an 8-bit KAN actor achieving strong rewards while remaining extremely low-latency on FPGA. 5. Reproducible and extensible: The authors provide an automated, open workflow that compiles trained KANs into FPGA-ready designs quickly, supporting experiments across multiple domains.

Perspectives

I’m interested in bridging the gap between promising new model families and real deployment constraints. KANs are attractive because they represent complex relationships using many simple 1-D functions, but that same structure has been viewed as a barrier to efficient hardware implementation. In this work, I focused on turning that structure into an advantage: by discretizing each learned 1-D function and mapping it directly onto FPGA LUT fabric, inference becomes closer to “configuring logic” than emulating arithmetic. What I find most compelling is that pruning is naturally compatible with KANs’ additive structure, removing an edge cleanly removes a LUT contribution, so we can co-design training (quantization + pruning) and hardware (pipelined adders + LUT evaluation) to reach very low latency without sacrificing practical accuracy. I see this as a step toward interpretable, resource-efficient ML that can run in tight real-time loops, including control applications where power and latency matter as much as raw accuracy.

Duc Hoang
Massachusetts Institute of Technology

Read the Original

This page is a summary of: KANELÉ: Kolmogorov–Arnold Networks for Efficient LUT-based Evaluation, February 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3748173.3779202.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page