All Stories

  1. Artifact Evaluation in the FPGA Community and in ACM TRETS
  2. High-AccuracyPolynomial Model Training for RF Power Amplifiers: Leveraging FISTA Unrolling
  3. Training Signal Optimization for Behavioral Modeling and Digital Predistortion of RF Power Amplifiers
  4. A Survey of FPGA-based 3D CNN Accelerators and Hardware-aware Algorithmic Optimizations
  5. Miniature: Fast AI Supercomputer Networks Simulation on FPGAs
  6. Transfer Learning on the Edge for a Wireless Application Using an SoC Platform
  7. LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference
  8. Realizing Network-Attached FPGAs in the Cloud
  9. Optimizing FPGA Memory Allocation for Matrix-Matrix Multiplication Using Bayesian Optimization
  10. Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function
  11. Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function
  12. Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function
  13. HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models
  14. Minimizing Training Signal Length for Power Amplifier Characterization and Linearization
  15. Pets vs Cattle:  Heterogeneous Systems in the 21st Century
  16. Artifact Evaluation for ACM TRETS Papers Submitted from the FPT Journal Track
  17. The Future of FPGA Acceleration in Datacenters and the Cloud
  18. Optimizing Designs Using Several Types of Memories on Modern FPGAs
  19. FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization
  20. Evaluating Theoretical Baselines for ML Benchmarking Across Different Accelerators
  21. Computationally Efficient Look-up-Tables for Behavioral Modelling and Digital Pre-distortion of Multi-standard Wireless Systems
  22. FPGAs in The Cloud
  23. FPGAs in the Cloud
  24. High‐performance transformation of protein structure representation from internal to Cartesian coordinates
  25. Strategies and Demonstration to Support Multiple Wireless Protocols with a Single RF Front-End
  26. A Novel Physical Layer Authentication With PAPR Reduction Based on Channel and Hardware Frequency Responses
  27. Real Time Receiver Baseband Processing Platform for Sub 6 GHz PHY Layer Experiments
  28. Evaluation of Optimized CNNs on Heterogeneous Accelerators using a Novel Benchmarking Approach
  29. SIFO: Secure Computational Infrastructure Using FPGA Overlays
  30. QuTiBench
  31. Garbled Circuits in the Cloud using FPGA Enabled Nodes
  32. Detection of Different Wireless Protocols on an FPGA with the Same Analog/RF Front End
  33. High-Level and Compact Design of Cross-Channel LTE DownLink Channel Encoder
  34. FINN- R
  35. Local and Global Shared Memory for Task Based HPC Applications on Heterogeneous Platforms
  36. Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic
  37. Accelerating big data applications using lightweight virtualization framework on enterprise cloud
  38. FPGA modeling techniques for detecting and demodulating multiple wireless protocols
  39. FIM: Performance Prediction for Parallel Computation in Iterative Data Processing Applications
  40. Secure Function Evaluation Using an FPGA Overlay Architecture
  41. A Framework for Developing Parallel Applications with high level Tasks on Heterogeneous Platforms
  42. Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms
  43. Performance prediction techniques for scalable large data processing in distributed MPI systems
  44. Open-Source Variable-Precision Floating-Point Library for Major Commercial FPGAs
  45. Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus
  46. Unified and lightweight tasks and conduits: A high level parallel programming framework
  47. Modeling considerations for the hardware-software co-design of flexible modern wireless transceivers
  48. State-Action Based Link Layer Design for IEEE 802.11b Compliant MATLAB-Based SDR
  49. High-level hardware-software co-design of an 802.11a transceiver system using Zynq SoC
  50. Cardiac MRI compressed sensing image reconstruction with a graphics processing unit
  51. High-Level System Design of IEEE 802.11b Standard-Compliant Link Layer for MATLAB-Based SDR
  52. Validity and reliability of Kinect skeleton for measuring shoulder joint angles: a feasibility study
  53. Accelerating K-Means clustering with parallel implementations and GPU computing
  54. GPU implementation of reverse coordinate conversion for proteins
  55. Leakage evaluation on power balance countermeasure against side-channel attack on FPGAs
  56. Balance power leakage to fight against side-channel analysis at gate level in FPGAs
  57. Side-channel analysis of MAC-Keccak hardware implementations
  58. Accuracy of kinect for measuring shoulder joint angles in multiple planes of motion
  59. Kernel Specialization Provides Adaptable GPU Code for Particle Image Velocimetry
  60. Behavioral Non-portability in Scientific Numeric Computing
  61. Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver
  62. Power analysis attack on hardware implementation of MAC-Keccak on FPGAs
  63. Accelerating protein coordinate conversion using GPUs
  64. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs
  65. Reducing Processing Latency with a Heterogeneous FPGA-Processor Framework
  66. Validity and reliability of kinect for measuring shoulder joint angles
  67. Make it real: Effective floating-point reasoning via exact arithmetic
  68. FPGA-based hyperspectral covariance coprocessor for size, weight, and power constrained platforms
  69. Vendor agnostic, high performance, double precision Floating Point division for FPGAs
  70. Development of a low-cost, adaptive, clinician-friendly virtual rehabilitation system
  71. Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
  72. A Message from the General Chair and Program Chair
  73. Minimum energy operation for clustered island-style FPGAs
  74. The effect of temporal impulse response on experimental reduction of photon scatter in time-resolved diffuse optical tomography
  75. Minimum Energy Analysis and Experimental Verification of a Latch-Based Subthreshold FPGA
  76. Characterization of a single-supply subthreshold FPGA
  77. Cognitive radio universal software hardware
  78. CUDA and OpenCL implementations of 3D CT reconstruction for biomedical imaging
  79. VForce: An environment for portable applications on high performance systems with accelerators
  80. CRUSH: Cognitive Radio Universal Software Hardware
  81. Heterogeneous tasks and conduits framework for rapid application portability and deployment
  82. Cognitive Radio Universal Software Hardware
  83. Incremental clustering applied to radar deinterleaving
  84. Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation
  85. An Autonomous Vector/Scalar Floating Point Coprocessor for FPGAs
  86. VFloat
  87. Efficient template matching with variable size templates in CUDA
  88. A truly two-dimensional systolic array FPGA implementation of QR decomposition
  89. Implementing a Highly Parameterized Digital PIV System on Reconfigurable Hardware
  90. Message from the ASAP '09 General and Technical Program Chairs
  91. Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA
  92. FPGA Supercomputing Platforms, Architectures, and Techniques for Accelerating Computationally Complex Algorithms
  93. Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing
  94. Implementing phase unwrapping using Field Programmable Gate Arrays or Graphics Processing Units: A comparison
  95. Special issue: General-purpose processing using graphics processing units
  96. Efficient Communication Between the Embedded Processor and the Reconfigurable Logic on an FPGA
  97. An efficient implementation of a phase unwrapping kernel on reconfigurable hardware
  98. An FPGA Implementation of Explicit-State Model Checking
  99. An Efficient Implementation of a Phase Unwrapping Kernel on Reconfigurable Hardware
  100. Dynamo: a runtime partitioning system for FPGA-based HW/SW image processing systems
  101. K-means Clustering for Multispectral Images Using Floating-Point Divide
  102. Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
  103. Vforce: An Extensible Framework for Reconfigurable Supercomputing
  104. Advanced Components in the Variable Precision Floating-Point Library
  105. Automatic Sliding Window Operation Optimization for FPGA-Based
  106. Enabling MPEG-2 video playback in embedded systems through improved data cache efficiency
  107. Real-Time Particle Image Velocimetry for Feedback Loops Using FPGA Implementation
  108. Field-Programmable Gate Arrays in Embedded Systems
  109. Poster reception---Improving the performance of parallel backprojection on a reconfigurable supercomputer
  110. Parallel-Beam Backprojection: An FPGA Implementation Optimized for Medical Imaging
  111. Optimizing data intensive window-based image processing on reconfigurable hardware boards
  112. Applying reconfigurable hardware to the analysis of multispectral and hyperspectral imagery
  113. Accurate Power Estimation for Sequential CMOS Circuits Using Graph-based Methods
  114. Design issues for hardware implementation of an algorithm for segmenting hyperspectral imagery
  115. Effect of data truncation in an implementation of pixel clustering on a custom computing machine
  116. HML, a novel hardware description language and its translation to VHDL
  117. A data-centric approach to high-level synthesis
  118. Spatial and color clustering on an FPGA-based computer system
  119. Rothko: a three-dimensional FPGA
  120. Division and square root: choosing the right implementation
  121. Optimizing the data cache performance of a software MPEG-2 video decoder
  122. Rothko: A three dimensional FPGA architecture, its fabrication, and design tools
  123. Area and performance tradeoffs in floating-point divide and square-root implementations
  124. An automaton model for scheduling constraints in synchronous machines
  125. Non-restoring integer square root: A case study in design by principled optimization
  126. Reasoning about pipelines with structural hazards
  127. Verifying a logic-synthesis algorithm and implementation: a case study in software verification
  128. A methodology for efficient hardware verification
  129. PBS: proven Boolean simplification
  130. Erratum to: High level synthesis and generation FPGAs with the BEDROC system
  131. High level synthesis and generating FPGAs with the BEDROC system
  132. Formally verified synthesis of combinational CMOS circuits
  133. From programs to transistors: Verifying hardware synthesis tools
  134. Reasoning about the function and timing of integrated circuits with interval temporal logic
  135. Automatic determination of signal flow through MOS transistor networks
  136. Runtime assignment of reconfigurable hardware components for image processing pipelines
  137. Run-time execution of reconfigurable hardware in a Java environment
  138. Design tradeoffs in a hardware implementation of the k-means clustering algorithm
  139. High level synthesis for designing custom computing hardware
  140. Truly rapid prototyping requires high level synthesis