All Stories

  1. Artifact Evaluation in the FPGA Community and in ACM TRETS
  2. Training Signal Optimization for Behavioral Modeling and Digital Predistortion of RF Power Amplifiers
  3. A Survey of FPGA-based 3D CNN Accelerators and Hardware-aware Algorithmic Optimizations
  4. Miniature: Fast AI Supercomputer Networks Simulation on FPGAs
  5. Transfer Learning on the Edge for a Wireless Application Using an SoC Platform
  6. LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference
  7. Realizing Network-Attached FPGAs in the Cloud
  8. Optimizing FPGA Memory Allocation for Matrix-Matrix Multiplication Using Bayesian Optimization
  9. Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function
  10. Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function
  11. Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function
  12. HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models
  13. Minimizing Training Signal Length for Power Amplifier Characterization and Linearization
  14. Pets vs Cattle:  Heterogeneous Systems in the 21st Century
  15. Artifact Evaluation for ACM TRETS Papers Submitted from the FPT Journal Track
  16. The Future of FPGA Acceleration in Datacenters and the Cloud
  17. Optimizing Designs Using Several Types of Memories on Modern FPGAs
  18. FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization
  19. Evaluating Theoretical Baselines for ML Benchmarking Across Different Accelerators
  20. Computationally Efficient Look-up-Tables for Behavioral Modelling and Digital Pre-distortion of Multi-standard Wireless Systems
  21. FPGAs in The Cloud
  22. FPGAs in the Cloud
  23. High‐performance transformation of protein structure representation from internal to Cartesian coordinates
  24. Strategies and Demonstration to Support Multiple Wireless Protocols with a Single RF Front-End
  25. A Novel Physical Layer Authentication With PAPR Reduction Based on Channel and Hardware Frequency Responses
  26. Real Time Receiver Baseband Processing Platform for Sub 6 GHz PHY Layer Experiments
  27. Evaluation of Optimized CNNs on Heterogeneous Accelerators using a Novel Benchmarking Approach
  28. SIFO: Secure Computational Infrastructure Using FPGA Overlays
  29. QuTiBench
  30. Garbled Circuits in the Cloud using FPGA Enabled Nodes
  31. Detection of Different Wireless Protocols on an FPGA with the Same Analog/RF Front End
  32. High-Level and Compact Design of Cross-Channel LTE DownLink Channel Encoder
  33. FINN- R
  34. Local and Global Shared Memory for Task Based HPC Applications on Heterogeneous Platforms
  35. Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic
  36. Accelerating big data applications using lightweight virtualization framework on enterprise cloud
  37. FPGA modeling techniques for detecting and demodulating multiple wireless protocols
  38. FIM: Performance Prediction for Parallel Computation in Iterative Data Processing Applications
  39. Secure Function Evaluation Using an FPGA Overlay Architecture
  40. A Framework for Developing Parallel Applications with high level Tasks on Heterogeneous Platforms
  41. Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms
  42. Performance prediction techniques for scalable large data processing in distributed MPI systems
  43. Open-Source Variable-Precision Floating-Point Library for Major Commercial FPGAs
  44. Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus
  45. Unified and lightweight tasks and conduits: A high level parallel programming framework
  46. Modeling considerations for the hardware-software co-design of flexible modern wireless transceivers
  47. State-Action Based Link Layer Design for IEEE 802.11b Compliant MATLAB-Based SDR
  48. High-level hardware-software co-design of an 802.11a transceiver system using Zynq SoC
  49. Cardiac MRI compressed sensing image reconstruction with a graphics processing unit
  50. High-Level System Design of IEEE 802.11b Standard-Compliant Link Layer for MATLAB-Based SDR
  51. Validity and reliability of Kinect skeleton for measuring shoulder joint angles: a feasibility study
  52. Accelerating K-Means clustering with parallel implementations and GPU computing
  53. GPU implementation of reverse coordinate conversion for proteins
  54. Leakage evaluation on power balance countermeasure against side-channel attack on FPGAs
  55. Balance power leakage to fight against side-channel analysis at gate level in FPGAs
  56. Side-channel analysis of MAC-Keccak hardware implementations
  57. Accuracy of kinect for measuring shoulder joint angles in multiple planes of motion
  58. Kernel Specialization Provides Adaptable GPU Code for Particle Image Velocimetry
  59. Behavioral Non-portability in Scientific Numeric Computing
  60. Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver
  61. Power analysis attack on hardware implementation of MAC-Keccak on FPGAs
  62. Accelerating protein coordinate conversion using GPUs
  63. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs
  64. Reducing Processing Latency with a Heterogeneous FPGA-Processor Framework
  65. Validity and reliability of kinect for measuring shoulder joint angles
  66. Make it real: Effective floating-point reasoning via exact arithmetic
  67. FPGA-based hyperspectral covariance coprocessor for size, weight, and power constrained platforms
  68. Vendor agnostic, high performance, double precision Floating Point division for FPGAs
  69. Development of a low-cost, adaptive, clinician-friendly virtual rehabilitation system
  70. Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
  71. A Message from the General Chair and Program Chair
  72. Minimum energy operation for clustered island-style FPGAs
  73. The effect of temporal impulse response on experimental reduction of photon scatter in time-resolved diffuse optical tomography
  74. Minimum Energy Analysis and Experimental Verification of a Latch-Based Subthreshold FPGA
  75. Characterization of a single-supply subthreshold FPGA
  76. Cognitive radio universal software hardware
  77. CUDA and OpenCL implementations of 3D CT reconstruction for biomedical imaging
  78. VForce: An environment for portable applications on high performance systems with accelerators
  79. CRUSH: Cognitive Radio Universal Software Hardware
  80. Heterogeneous tasks and conduits framework for rapid application portability and deployment
  81. Cognitive Radio Universal Software Hardware
  82. Incremental clustering applied to radar deinterleaving
  83. Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation
  84. An Autonomous Vector/Scalar Floating Point Coprocessor for FPGAs
  85. VFloat
  86. Efficient template matching with variable size templates in CUDA
  87. A truly two-dimensional systolic array FPGA implementation of QR decomposition
  88. Implementing a Highly Parameterized Digital PIV System on Reconfigurable Hardware
  89. Message from the ASAP '09 General and Technical Program Chairs
  90. Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA
  91. FPGA Supercomputing Platforms, Architectures, and Techniques for Accelerating Computationally Complex Algorithms
  92. Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing
  93. Implementing phase unwrapping using Field Programmable Gate Arrays or Graphics Processing Units: A comparison
  94. Special issue: General-purpose processing using graphics processing units
  95. Efficient Communication Between the Embedded Processor and the Reconfigurable Logic on an FPGA
  96. An efficient implementation of a phase unwrapping kernel on reconfigurable hardware
  97. An FPGA Implementation of Explicit-State Model Checking
  98. An Efficient Implementation of a Phase Unwrapping Kernel on Reconfigurable Hardware
  99. Dynamo: a runtime partitioning system for FPGA-based HW/SW image processing systems
  100. K-means Clustering for Multispectral Images Using Floating-Point Divide
  101. Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
  102. Vforce: An Extensible Framework for Reconfigurable Supercomputing
  103. Advanced Components in the Variable Precision Floating-Point Library
  104. Automatic Sliding Window Operation Optimization for FPGA-Based
  105. Enabling MPEG-2 video playback in embedded systems through improved data cache efficiency
  106. Real-Time Particle Image Velocimetry for Feedback Loops Using FPGA Implementation
  107. Field-Programmable Gate Arrays in Embedded Systems
  108. Poster reception---Improving the performance of parallel backprojection on a reconfigurable supercomputer
  109. Parallel-Beam Backprojection: An FPGA Implementation Optimized for Medical Imaging
  110. Optimizing data intensive window-based image processing on reconfigurable hardware boards
  111. Applying reconfigurable hardware to the analysis of multispectral and hyperspectral imagery
  112. Accurate Power Estimation for Sequential CMOS Circuits Using Graph-based Methods
  113. Design issues for hardware implementation of an algorithm for segmenting hyperspectral imagery
  114. Effect of data truncation in an implementation of pixel clustering on a custom computing machine
  115. HML, a novel hardware description language and its translation to VHDL
  116. A data-centric approach to high-level synthesis
  117. Spatial and color clustering on an FPGA-based computer system
  118. Rothko: a three-dimensional FPGA
  119. Division and square root: choosing the right implementation
  120. Optimizing the data cache performance of a software MPEG-2 video decoder
  121. Rothko: A three dimensional FPGA architecture, its fabrication, and design tools
  122. Area and performance tradeoffs in floating-point divide and square-root implementations
  123. An automaton model for scheduling constraints in synchronous machines
  124. Non-restoring integer square root: A case study in design by principled optimization
  125. Reasoning about pipelines with structural hazards
  126. Verifying a logic-synthesis algorithm and implementation: a case study in software verification
  127. A methodology for efficient hardware verification
  128. PBS: proven Boolean simplification
  129. Erratum to: High level synthesis and generation FPGAs with the BEDROC system
  130. High level synthesis and generating FPGAs with the BEDROC system
  131. Formally verified synthesis of combinational CMOS circuits
  132. From programs to transistors: Verifying hardware synthesis tools
  133. Reasoning about the function and timing of integrated circuits with interval temporal logic
  134. Automatic determination of signal flow through MOS transistor networks
  135. Runtime assignment of reconfigurable hardware components for image processing pipelines
  136. Run-time execution of reconfigurable hardware in a Java environment
  137. Design tradeoffs in a hardware implementation of the k-means clustering algorithm
  138. High level synthesis for designing custom computing hardware
  139. Truly rapid prototyping requires high level synthesis