All Stories

  1. Interpreting High Order Epistasis Using Sparse Transformers
  2. Transient-Execution Attacks: a Computer Architect Perspective
  3. Temperature-aware Core Management in MPSoCs: Modeling and Evaluation using MRMs
  4. A New Energy-Efficient Hybrid Wide-Operand Adder Architecture
  5. A Survey on Fully Homomorphic Encryption
  6. Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems
  7. $2^n$ RNS Scalers for Extended 4-Moduli Sets
  8. GPU-assisted HEVC intra decoder
  9. Arithmetic-Based Binary-to-RNS Converter Modulo ${\{2^{n}{\pm}k\}}$ for $jn$ -bit Dynamic Range
  10. Reverse Converter Design via Parallel-Prefix Adders: Novel Components, Methodology, and Implementations
  11. Stretching the limits of Programmable Embedded Devices for Public-key Cryptography
  12. ROM-less RNS-to-binary converter moduli {22n − 1, 22n + 1, 2n − 3, 2n + 3}
  13. Collaborative inter-prediction on CPU+GPU systems
  14. Reconfigurable data flow engine for HEVC motion estimation
  15. On the Evaluation of Multi-core Systems with SIMD Engines for Public-Key Cryptography
  16. Performance-Aware Task Management and Frequency Scaling in Embedded Systems
  17. FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems
  18. Efficient sign identification engines for integers represented in RNS extended 3-moduli set {2 n − 1, 2 n + k , 2 n + 1}
  19. Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs
  20. Method for designing multi-channel RNS architectures to prevent power analysis SCA
  21. Combining flexibility with low power: Dataflow and wide-pipeline LDPC decoding engines in the Gbit/s era
  22. Cooperative CPU+GPU deblocking filter parallelization for high performance HEVC video codecs
  23. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems
  24. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models
  25. A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS
  26. An Efficient Scalable RNS Architecture for Large Dynamic Ranges
  27. Cache-aware Roofline model: Upgrading the loft
  28. Finite-Difference in Time-Domain Scalable Implementations on CUDA and OpenCL
  29. Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems
  30. EFFICIENT METHOD FOR DESIGNING MODULO {2 n ± k} MULTIPLIERS
  31. SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores
  32. Monitoring Performance and Power for Application Characterization with the Cache-Aware Roofline Model
  33. Exploiting Coarse-grained Parallelism in Multi-transform Architectures for H.264/AVC High Profile Codecs
  34. Method to Design General RNS Reverse Converters for Extended Moduli Sets
  35. Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs
  36. Randomised multi-modulo residue number system architecture for double-and-add to prevent power analysis side channel attacks
  37. A Lab Project on the Design and Implementation of Programmable and Configurable Embedded Systems
  38. A comparison of computing architectures and parallelization frameworks based on a two-dimensional FDTD
  39. Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platforms
  40. A compact and scalable RNS architecture
  41. RNS Reverse Converters for Moduli Sets With Dynamic Ranges up to $(8n+1)$ -bit
  42. An RNS-based architecture targeting hardware accelerators for modular arithmetic
  43. Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines
  44. The CRNS framework and its application to programmable and reconfigurable cryptography
  45. Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms
  46. Reconfigurable Architecture for Cryptography over Binary Finite Fields
  47. 2-Axis Magnetometers Based on Full Wheatstone Bridges Incorporating Magnetic Tunnel Junctions Connected in Series
  48. Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems
  49. Real-time implementation of remotely sensed hyperspectral image unmixing on GPUs
  50. RNS Arithmetic Units for Modulo {2^n+-k}
  51. VLSI Reverse Converter for RNS Based on the Moduli Set
  52. High Performance Unified Architecture for Forward and Inverse Quantization in H.264/AVC
  53. Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems
  54. Energy efficient stream-based configurable architecture for embedded platforms
  55. Simultaneous Multi-Level Divisible Load Balancing for Heterogeneous Desktop Systems
  56. Computation of Induced Dipoles in Molecular Mechanics Simulations Using Graphics Processors
  57. Corrections to “MRC-Based RNS Reverse Converters for the Four-Moduli Sets $\{2^{n} + 1,\ 2^{n} - 1,\ 2^{n},\ 2^{2n + 1} - 1\}$ and
  58. MRC-Based RNS Reverse Converters for the Four-Moduli Sets $\{2^{n} + 1, 2^{n} - 1, 2^{n}, 2^{2n + 1} - 1\}$ and $ \{2^{n} + 1, 2^{n} - ...
  59. Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design
  60. On Realistic Divisible Load Scheduling in Highly Heterogeneous Distributed Systems
  61. Efficient implementation of multi-moduli architectures for Binary-to-RNS conversion
  62. Scheduling Divisible Loads on Heterogeneous Desktop Systems with Limited Memory
  63. Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters
  64. A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing
  65. Parallel Computing – Special Issue
  66. Binary-to-RNS Conversion Units for moduli {2^n ± 3}
  67. High throughput and scalable architecture for unified transform coding in embedded H.264/AVC video coding systems
  68. Real-time DVB-S2 LDPC decoding on many-core GPU accelerators
  69. Massively LDPC Decoding on Multicore Architectures
  70. Parallel LDPC Decoding
  71. Introduction
  72. A quantitative analysis of firing rate estimators: Unveiling bias sources
  73. Exploiting SIMD extensions for linear image processing with OpenCL
  74. Hardware/software co-design of H.264/AVC encoders for multi-core embedded systems
  75. H.264/AVC framework for multi-core embedded video encoders
  76. Unifying stream based and reconfigurable computing to design application accelerators
  77. An improved RNS generator 2n ± k based on threshold logic
  78. Arithmetic Units for RNS Moduli {2n-3} and {2n+3} Operations
  79. Embedded multicore architectures for LDPC decoding
  80. Elliptic Curve point multiplication on GPUs
  81. Efficient Independent Component Analysis on a GPU
  82. Challenges and trends in the development of a magnetoresistive biochip portable platform
  83. Programming Cell/BE and GPUs systems for real-time video encoding
  84. Collaborative execution environment for heterogeneous parallel systems
  85. Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing
  86. Preface
  87. Euro-Par 2009 – Parallel Processing Workshops
  88. Iterative induced dipoles computation for molecular mechanics on GPUs
  89. p264
  90. Development and evaluation of scalable video motion estimators on GPU
  91. Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function
  92. Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach
  93. Modelling and programming stream-based distributed computing based on the meta-pipeline approach
  94. How GPUs can outperform ASICs for fast LDPC decoding
  95. Multi-core platforms for signal processing: source and channel coding
  96. Neural code metrics: Analysis and application to the assessment of neural models
  97. CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications
  98. Distributed Software Platform for Automation and Control of General Anaesthesia
  99. A Portable and Autonomous Magnetic Detection Platform for Biosensing
  100. BIOELECTRONIC VISION
  101. Compact and Flexible Microcoded Elliptic Curve Processor for Reconfigurable Devices
  102. Bioelectronic Vision
  103. Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study
  104. Parallel LDPC Decoding on the Cell/B.E. Processor
  105. On the design of distributed autonomous embedded systems for biomedical applications
  106. Efficient FPGA elliptic curve cryptographic processor over GF(2m)
  107. Design and implementation of a tool for modeling and programming deadlock free meta-pipeline applications
  108. Merged Computation for Whirlpool Hashing
  109. Edge Stream Oriented LDPC Decoding
  110. On-the-fly attestation of reconfigurable hardware
  111. Merged computation for Whirlpool hashing
  112. A Parallel Algorithm for Advanced Video Motion Estimation on Multicore Architectures
  113. Low power microarchitecture with instruction reuse
  114. Distributed Web-based Platform for Computer Architecture Simulation
  115. Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs
  116. Application Specific Programmable IP Core for Motion Estimation: Technology Comparison Targeting Efficient Embedded Co-Processing Units
  117. An RNS based Specific Processor for Computing the Minimum Sum-of-Absolute-Differences
  118. BRAM-LUT Tradeoff on a Polymorphic DES Design
  119. Reconfigurable architectures and processors for real-time video motion estimation
  120. QCA-LG: A tool for the automatic layout generation of QCA combinational circuits
  121. Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling
  122. A Run-Time Reconfigurable Processor for Video Motion Estimation
  123. Meta-Pipeline: A New Execution Mechanism for Distributed Pipeline Processing
  124. Adaptive Motion Estimation Algorithm for H.264/AVC
  125. An Efficient Expectation-Maximisation Algorithm for Spike Classification
  126. An ASIP approach for adaptive AVC Motion Estimation
  127. Efficient Method for Magnitude Comparison in RNS Based on Two Pairs of Conjugate Moduli
  128. Caravela: A Novel Stream-Based Distributed Computing Environment
  129. Additive Logistic Regression Applied to Retina Modelling
  130. Feature Selection for the Stochastic Integrate and Fire Model
  131. Design and implementation of a stream-based distributedcomputing platform using graphics processing units
  132. Data buffering optimization methods toward a uniform programming interface for gpu-based applications
  133. Embedded Systems for Portable and Mobile Video Platforms
  134. A New Hand-Held Microsystem Architecture for Biological Analysis
  135. MAESTRO2: EXPERIMENTAL EVALUATION OF COMMUNICATION PERFORMANCE IMPROVEMENT TECHNIQUES IN THE LINK LAYER
  136. Configurable Embedded Core for Controlling Electro-Mechanical Systems
  137. Improving SHA-2 Hardware Implementations
  138. Rescheduling for Optimized SHA-1 Calculation
  139. Low Power Distance Measurement Unit for Real-Time Hardware Motion Estimators
  140. On Task Scheduling Accuracy: Evaluation Methodology and Results
  141. List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures
  142. A programmable cellular neural network circuit
  143. Fast transcoding architectures for insertion of non-regular shaped objects in the compressed DCT-domain
  144. An FPL Bioinspired Visual Encoding System to Stimulate Cortical Neurons in Real-Time
  145. Customisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs
  146. A New Efficient VLSI Architecture for Full Search Block Matching Motion Estimation
  147. Synchronous Non-local Image Processing on Orthogonal Multiprocessor Systems
  148. Exploiting Unused Time Slots in List Scheduling Considering Communication Contention
  149. A Platform Independent Parallelising Tool Based on Graph Theoretic Models
  150. Scheduling Task Graphs on Arbitrary Processor Architectures Considering Contention
  151. Customizable and Reduced Hardware Motion Estimation Processors
  152. Massive Data Classification of Neural Responses
  153. Bioinspired Stimulus Encoder for Cortical Visual Neuroprostheses
  154. On the Implementation and Evaluation of Berkeley Sockets on Maestro2 cluster computing environment
  155. Nanotechnology and the Detection of Biomolecular Recognition Using Magnetoresistive Transducers