What is it about?
This work provides a highly efficient iterative flow that searches for the best HLS directive configuration for a specific design, which produces the Verilog design with low latency (number of clock cycles) and high frequency under a floorplan constraint on multi-die FPGAs. It's conference version, FADO 1.0 published in FPGA '23 used a synthesis-based QoR library built with HLS-reported values, while this extension with the FADO 2.0 flow applies an analytical model, boosting both the turnaround time and the performance of the generated designs.
Featured Image
Why is it important?
Difficulties are two-fold. 1. For directive search: (1.1) It is already faced with a large design space. For Example, the Vitis HLS 2020.2 has 26 directives, each having numerous possible parameter settings. (1.2) Besides, HLS directives and QoRs (latency, resource, etc.) have non-monotonic relationships, which requires the DSE algorithm to effectively avoid getting trapped in local optima. (1.3) Additionally, previous directive search works only consider the overall resource on an FPGA, i.e., only suitable for single-die FPGAs. 2. For floorplan: Floorplanning on multi-die FPGAs has been proven to be an effective way to maximize the achievable frequency. However, the previous SOTAs using MILP solver for finding an ideal floorplan is too time-consuming. Hence, we design a highly efficient iterative flow, integrating an incremental floorplan legalization with latency-bottleneck-guided directive search, to achieve orders-of-magnitude faster co-search than the previous SOTA, meanwhile achieving better QoR on both latency (cycles) and frequency (MHz).
Perspectives
HLS design space exploration, usually understood as tuning the possible directives for given designs, mainly adopts two methods, either synthesis-based or model-based. For synthesis methods, we first identify that iteratively triggering HLS for the whole design is impossible due to the hundreds/thousands of steps expected for the convergence of performance optimization. Then, we rule out the usage of machine learning for this task because the current HLS community lacks a comprehensive dataset that has a balanced spread on all metrics such as latency and resources. Then, FADO 1.0 tries building QoR library for sub-modules and FADO 2.0 focuses on transfering a classic analytical model to our brand new use case. Fortunately, things worked out as we expected---analytical model with improved search strategy achieves both better search speed and the output design performance.
Linfeng Du
Hong Kong University of Science and Technology
Read the Original
This page is a summary of: FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs, ACM Transactions on Reconfigurable Technology and Systems, March 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3653458.
You can read the full text:
Contributors
The following have contributed to this page