All Stories

  1. Holistic Optimization Framework for FPGA Accelerators
  2. MTrain: Enable Efficient CNN Training on Heterogeneous FPGA-Based Edge Servers
  3. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
  4. Reaction Latency Analysis of Message Synchronization in Edge-assisted Autonomous Driving
  5. Invited: Coping with Interconnects
  6. Using a multilevel framework to solve quantum layout synthesis problem.
  7. Stream-HLS: Towards Automatic Dataflow Acceleration
  8. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
  9. A Unified Framework for Automated Code Transformation and Pragma Insertion
  10. InTRRA: Inter-Task Resource-Repurposing Accelerator for Efficient Transformer Inference on FPGAs
  11. SAT-Accel: A Modern SAT Solver on a FPGA
  12. Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling
  13. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  14. FiberFlex: FPGA-based Intelligent & Distributed Fiber Sensor System for Pedestrian Recognition
  15. Amortizing Embodied Carbon Across Generations
  16. CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs
  17. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture
  18. Efficient Task Transfer for HLS DSE
  19. Quantum State Preparation Circuit Optimization Exploiting Don't Cares
  20. GNN-Based Performance Prediction of Quantum Optimization of Maximum Independent Set
  21. RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis
  22. Reducing Smart Phone Environmental Footprints with In-Memory Processing
  23. Learning to Compare Hardware Designs for High-Level Synthesis
  24. Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
  25. PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
  26. CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
  27. SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators
  28. Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas
  29. Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis
  30. SpectraFlux: Harnessing the Flow of Multi-FPGA in Mass Spectrometry Clustering
  31. TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs
  32. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  33. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
  34. FPGA-based Accelerator for Sparse Triangular Solver
  35. Scheduling and Physical Design
  36. Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets
  37. REFRESH FPGAs: Sustainable FPGA Chiplet Architectures
  38. AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP
  39. Efficient Hardware and Software Design for On-device Learning
  40. TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design
  41. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  42. NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
  43. High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives
  44. Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition
  45. Scalable Optimal Layout Synthesis for NISQ Quantum Processors
  46. Lightning Talk: Scaling Up Quantum Compilation – Challenges and Opportunities
  47. A Comprehensive Automated Exploration Framework for Systolic Array Designs
  48. RapidStream 2.0: Automated Parallel Implementation of Latency Insensitive FPGA Designs Through Partial Reconfiguration
  49. FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis
  50. FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA
  51. HMLib: Efficient Data Transfer for HLS Using Host Memory
  52. Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver
  53. CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture
  54. Sustainable AI Processing at the Edge
  55. FPGA HLS Today: Successes, Challenges, and Opportunities
  56. Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream
  57. Qubit Mapping for Reconfigurable Atom Arrays
  58. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation
  59. EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization
  60. Energy-Efficient LSTM Inference Accelerator for Real-Time Causal Prediction
  61. N-DISE
  62. AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators
  63. Serpens
  64. Automated accelerator optimization aided by graph neural networks
  65. Improving GNN-based accelerator design automation with meta learning
  66. H2H
  67. Automated Accelerator Optimization Aided by Graph Neural Networks
  68. SPA-GCN: Efficient and Flexible GCN Accelerator with Application for Graph Similarity Computation
  69. Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
  70. Accelerating SSSP for Power-Law Graphs
  71. RapidStream
  72. Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
  73. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
  74. AutoBridge
  75. Extending High-Level Synthesis for Task-Parallel Programs
  76. HBM Connect: High-Performance HLS Interconnect for FPGA HBM
  77. MOCHA
  78. AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators
  79. AutoSA
  80. BLINK
  81. HeteroRefactor
  82. Bonsai: High-Performance Adaptive Merge Tree Sorting
  83. Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
  84. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  85. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management
  86. Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis
  87. LANMC
  88. HeteroCL
  89. Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment
  90. HLS-based optimization and design space exploration for applications with variable loop bounds
  91. PolySA
  92. SODA
  93. TGPA
  94. Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework
  95. ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA
  96. Latte: Locality Aware Transformation for High-Level Synthesis
  97. CPU-FPGA Co-Optimization for Big Data Applications
  98. Bandwidth Optimization Through On-Chip Memory Restructuring for HLS
  99. Caffeine
  100. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
  101. Invited - Heterogeneous datacenters
  102. Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication
  103. ARAPrototyper
  104. InterFS
  105. CMOST
  106. On-chip interconnection network for accelerator-rich architectures
  107. A Fully Pipelined and Dynamically Composable Architecture of CGRA
  108. Automatic memory partitioning and scheduling for throughput and power optimization