All Stories

  1. AgRefactor: Refactoring for HLS Compatibility with a Self-Evolving Agentic Workflow
  2. AGILE: Lightweight and Efficient Asynchronous GPU-SSD Integration
  3. Holistic Optimization Framework for FPGA Accelerators
  4. MTrain: Enable Efficient CNN Training on Heterogeneous FPGA-Based Edge Servers
  5. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
  6. Assessing Quantum Layout Synthesis Tools via Known Optimal-SWAP Cost Benchmarks
  7. Reaction Latency Analysis of Message Synchronization in Edge-assisted Autonomous Driving
  8. Invited: Coping with Interconnects
  9. Using a multilevel framework to solve quantum layout synthesis problem.
  10. Stream-HLS: Towards Automatic Dataflow Acceleration
  11. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
  12. A Unified Framework for Automated Code Transformation and Pragma Insertion
  13. InTRRA: Inter-Task Resource-Repurposing Accelerator for Efficient Transformer Inference on FPGAs
  14. SAT-Accel: A Modern SAT Solver on a FPGA
  15. Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling
  16. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  17. FiberFlex: FPGA-based Intelligent & Distributed Fiber Sensor System for Pedestrian Recognition
  18. Amortizing Embodied Carbon Across Generations
  19. CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs
  20. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture
  21. Efficient Task Transfer for HLS DSE
  22. Quantum State Preparation Circuit Optimization Exploiting Don't Cares
  23. GNN-Based Performance Prediction of Quantum Optimization of Maximum Independent Set
  24. RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis
  25. Reducing Smart Phone Environmental Footprints with In-Memory Processing
  26. Learning to Compare Hardware Designs for High-Level Synthesis
  27. Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
  28. PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
  29. CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
  30. SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators
  31. Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas
  32. Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis
  33. SpectraFlux: Harnessing the Flow of Multi-FPGA in Mass Spectrometry Clustering
  34. TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs
  35. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  36. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
  37. FPGA-based Accelerator for Sparse Triangular Solver
  38. Scheduling and Physical Design
  39. Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets
  40. REFRESH FPGAs: Sustainable FPGA Chiplet Architectures
  41. AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP
  42. Efficient Hardware and Software Design for On-device Learning
  43. TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design
  44. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  45. NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
  46. High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives
  47. Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition
  48. Scalable Optimal Layout Synthesis for NISQ Quantum Processors
  49. Lightning Talk: Scaling Up Quantum Compilation – Challenges and Opportunities
  50. A Comprehensive Automated Exploration Framework for Systolic Array Designs
  51. RapidStream 2.0: Automated Parallel Implementation of Latency Insensitive FPGA Designs Through Partial Reconfiguration
  52. FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis
  53. FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA
  54. HMLib: Efficient Data Transfer for HLS Using Host Memory
  55. Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver
  56. CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture
  57. Sustainable AI Processing at the Edge
  58. FPGA HLS Today: Successes, Challenges, and Opportunities
  59. Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream
  60. Qubit Mapping for Reconfigurable Atom Arrays
  61. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation
  62. EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization
  63. Energy-Efficient LSTM Inference Accelerator for Real-Time Causal Prediction
  64. N-DISE
  65. AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators
  66. Serpens
  67. Automated accelerator optimization aided by graph neural networks
  68. Improving GNN-based accelerator design automation with meta learning
  69. H2H
  70. Automated Accelerator Optimization Aided by Graph Neural Networks
  71. SPA-GCN: Efficient and Flexible GCN Accelerator with Application for Graph Similarity Computation
  72. Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
  73. Accelerating SSSP for Power-Law Graphs
  74. RapidStream
  75. Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
  76. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
  77. AutoBridge
  78. Extending High-Level Synthesis for Task-Parallel Programs
  79. HBM Connect: High-Performance HLS Interconnect for FPGA HBM
  80. MOCHA
  81. AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators
  82. AutoSA
  83. BLINK
  84. HeteroRefactor
  85. Bonsai: High-Performance Adaptive Merge Tree Sorting
  86. Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
  87. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  88. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management
  89. Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis
  90. LANMC
  91. HeteroCL
  92. Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment
  93. HLS-based optimization and design space exploration for applications with variable loop bounds
  94. PolySA
  95. SODA
  96. TGPA
  97. Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework
  98. ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA
  99. Latte: Locality Aware Transformation for High-Level Synthesis
  100. CPU-FPGA Co-Optimization for Big Data Applications
  101. Bandwidth Optimization Through On-Chip Memory Restructuring for HLS
  102. Caffeine
  103. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
  104. Invited - Heterogeneous datacenters
  105. Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication
  106. ARAPrototyper
  107. InterFS
  108. CMOST
  109. On-chip interconnection network for accelerator-rich architectures
  110. A Fully Pipelined and Dynamically Composable Architecture of CGRA
  111. Automatic memory partitioning and scheduling for throughput and power optimization