All Stories

  1. AGILE: Lightweight and Efficient Asynchronous GPU-SSD Integration
  2. Holistic Optimization Framework for FPGA Accelerators
  3. MTrain: Enable Efficient CNN Training on Heterogeneous FPGA-Based Edge Servers
  4. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
  5. Assessing Quantum Layout Synthesis Tools via Known Optimal-SWAP Cost Benchmarks
  6. Reaction Latency Analysis of Message Synchronization in Edge-assisted Autonomous Driving
  7. Invited: Coping with Interconnects
  8. Using a multilevel framework to solve quantum layout synthesis problem.
  9. Stream-HLS: Towards Automatic Dataflow Acceleration
  10. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
  11. A Unified Framework for Automated Code Transformation and Pragma Insertion
  12. InTRRA: Inter-Task Resource-Repurposing Accelerator for Efficient Transformer Inference on FPGAs
  13. SAT-Accel: A Modern SAT Solver on a FPGA
  14. Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling
  15. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  16. FiberFlex: FPGA-based Intelligent & Distributed Fiber Sensor System for Pedestrian Recognition
  17. Amortizing Embodied Carbon Across Generations
  18. CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs
  19. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture
  20. Efficient Task Transfer for HLS DSE
  21. Quantum State Preparation Circuit Optimization Exploiting Don't Cares
  22. GNN-Based Performance Prediction of Quantum Optimization of Maximum Independent Set
  23. RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis
  24. Reducing Smart Phone Environmental Footprints with In-Memory Processing
  25. Learning to Compare Hardware Designs for High-Level Synthesis
  26. Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
  27. PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
  28. CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
  29. SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators
  30. Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas
  31. Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis
  32. SpectraFlux: Harnessing the Flow of Multi-FPGA in Mass Spectrometry Clustering
  33. TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs
  34. Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
  35. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
  36. FPGA-based Accelerator for Sparse Triangular Solver
  37. Scheduling and Physical Design
  38. Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets
  39. REFRESH FPGAs: Sustainable FPGA Chiplet Architectures
  40. AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP
  41. Efficient Hardware and Software Design for On-device Learning
  42. TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design
  43. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  44. NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
  45. High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives
  46. Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition
  47. Scalable Optimal Layout Synthesis for NISQ Quantum Processors
  48. Lightning Talk: Scaling Up Quantum Compilation – Challenges and Opportunities
  49. A Comprehensive Automated Exploration Framework for Systolic Array Designs
  50. RapidStream 2.0: Automated Parallel Implementation of Latency Insensitive FPGA Designs Through Partial Reconfiguration
  51. FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis
  52. FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA
  53. HMLib: Efficient Data Transfer for HLS Using Host Memory
  54. Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver
  55. CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture
  56. Sustainable AI Processing at the Edge
  57. FPGA HLS Today: Successes, Challenges, and Opportunities
  58. Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream
  59. Qubit Mapping for Reconfigurable Atom Arrays
  60. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation
  61. EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization
  62. Energy-Efficient LSTM Inference Accelerator for Real-Time Causal Prediction
  63. N-DISE
  64. AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators
  65. Serpens
  66. Automated accelerator optimization aided by graph neural networks
  67. Improving GNN-based accelerator design automation with meta learning
  68. H2H
  69. Automated Accelerator Optimization Aided by Graph Neural Networks
  70. SPA-GCN: Efficient and Flexible GCN Accelerator with Application for Graph Similarity Computation
  71. Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
  72. Accelerating SSSP for Power-Law Graphs
  73. RapidStream
  74. Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
  75. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
  76. AutoBridge
  77. Extending High-Level Synthesis for Task-Parallel Programs
  78. HBM Connect: High-Performance HLS Interconnect for FPGA HBM
  79. MOCHA
  80. AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators
  81. AutoSA
  82. BLINK
  83. HeteroRefactor
  84. Bonsai: High-Performance Adaptive Merge Tree Sorting
  85. Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
  86. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  87. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management
  88. Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis
  89. LANMC
  90. HeteroCL
  91. Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment
  92. HLS-based optimization and design space exploration for applications with variable loop bounds
  93. PolySA
  94. SODA
  95. TGPA
  96. Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework
  97. ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA
  98. Latte: Locality Aware Transformation for High-Level Synthesis
  99. CPU-FPGA Co-Optimization for Big Data Applications
  100. Bandwidth Optimization Through On-Chip Memory Restructuring for HLS
  101. Caffeine
  102. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
  103. Invited - Heterogeneous datacenters
  104. Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication
  105. ARAPrototyper
  106. InterFS
  107. CMOST
  108. On-chip interconnection network for accelerator-rich architectures
  109. A Fully Pipelined and Dynamically Composable Architecture of CGRA
  110. Automatic memory partitioning and scheduling for throughput and power optimization