All Stories

  1. A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware
  2. AgRefactor: Refactoring for HLS Compatibility with a Self-Evolving Agentic Workflow
  3. AGILE: Lightweight and Efficient Asynchronous GPU-SSD Integration
  4. Holistic Optimization Framework for FPGA Accelerators
  5. MTrain: Enable Efficient CNN Training on Heterogeneous FPGA-Based Edge Servers
  6. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
  7. Assessing Quantum Layout Synthesis Tools via Known Optimal-SWAP Cost Benchmarks
  8. Reaction Latency Analysis of Message Synchronization in Edge-assisted Autonomous Driving
  9. Invited: Coping with Interconnects
  10. Using a multilevel framework to solve quantum layout synthesis problem.
  11. Stream-HLS: Towards Automatic Dataflow Acceleration
  12. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
  13. A Unified Framework for Automated Code Transformation and Pragma Insertion
  14. InTRRA: Inter-Task Resource-Repurposing Accelerator for Efficient Transformer Inference on FPGAs
  15. SAT-Accel: A Modern SAT Solver on a FPGA
  16. Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling
  17. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  18. FiberFlex: FPGA-based Intelligent & Distributed Fiber Sensor System for Pedestrian Recognition
  19. Amortizing Embodied Carbon Across Generations
  20. CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs
  21. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture
  22. Efficient Task Transfer for HLS DSE
  23. Quantum State Preparation Circuit Optimization Exploiting Don't Cares
  24. GNN-Based Performance Prediction of Quantum Optimization of Maximum Independent Set
  25. RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis
  26. Reducing Smart Phone Environmental Footprints with In-Memory Processing
  27. Learning to Compare Hardware Designs for High-Level Synthesis
  28. Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
  29. PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
  30. CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
  31. SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators
  32. Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas
  33. Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis
  34. SpectraFlux: Harnessing the Flow of Multi-FPGA in Mass Spectrometry Clustering
  35. TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs
  36. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  37. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
  38. FPGA-based Accelerator for Sparse Triangular Solver
  39. Scheduling and Physical Design
  40. Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets
  41. REFRESH FPGAs: Sustainable FPGA Chiplet Architectures
  42. AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP
  43. Efficient Hardware and Software Design for On-device Learning
  44. TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design
  45. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  46. NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
  47. High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives
  48. Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition
  49. Scalable Optimal Layout Synthesis for NISQ Quantum Processors
  50. Lightning Talk: Scaling Up Quantum Compilation – Challenges and Opportunities
  51. A Comprehensive Automated Exploration Framework for Systolic Array Designs
  52. RapidStream 2.0: Automated Parallel Implementation of Latency Insensitive FPGA Designs Through Partial Reconfiguration
  53. FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis
  54. FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA
  55. HMLib: Efficient Data Transfer for HLS Using Host Memory
  56. Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver
  57. CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture
  58. Sustainable AI Processing at the Edge
  59. FPGA HLS Today: Successes, Challenges, and Opportunities
  60. Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream
  61. Qubit Mapping for Reconfigurable Atom Arrays
  62. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation
  63. EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization
  64. Energy-Efficient LSTM Inference Accelerator for Real-Time Causal Prediction
  65. N-DISE
  66. AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators
  67. Serpens
  68. Automated accelerator optimization aided by graph neural networks
  69. Improving GNN-based accelerator design automation with meta learning
  70. H2H
  71. Automated Accelerator Optimization Aided by Graph Neural Networks
  72. SPA-GCN: Efficient and Flexible GCN Accelerator with Application for Graph Similarity Computation
  73. Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
  74. Accelerating SSSP for Power-Law Graphs
  75. RapidStream
  76. Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
  77. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
  78. AutoBridge
  79. Extending High-Level Synthesis for Task-Parallel Programs
  80. HBM Connect: High-Performance HLS Interconnect for FPGA HBM
  81. MOCHA
  82. AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators
  83. AutoSA
  84. BLINK
  85. HeteroRefactor
  86. Bonsai: High-Performance Adaptive Merge Tree Sorting
  87. Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
  88. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  89. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management
  90. Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis
  91. LANMC
  92. HeteroCL
  93. Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment
  94. HLS-based optimization and design space exploration for applications with variable loop bounds
  95. PolySA
  96. SODA
  97. TGPA
  98. Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework
  99. ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA
  100. Latte: Locality Aware Transformation for High-Level Synthesis
  101. CPU-FPGA Co-Optimization for Big Data Applications
  102. Bandwidth Optimization Through On-Chip Memory Restructuring for HLS
  103. Caffeine
  104. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
  105. Invited - Heterogeneous datacenters
  106. Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication
  107. ARAPrototyper
  108. InterFS
  109. CMOST
  110. On-chip interconnection network for accelerator-rich architectures
  111. A Fully Pipelined and Dynamically Composable Architecture of CGRA
  112. Automatic memory partitioning and scheduling for throughput and power optimization