All Stories

  1. Dynamic Thread Coarsening for CPU and GPU OpenMP Code
  2. Predicting Performance for OpenMP GPU Parameter Choices
  3. Profile Generation for GPU Targets
  4. Automatic Parallelization and OpenMP Offloading of Fortran Array Notation
  5. Memory Transfer Decomposition: Exploring Smart Data Movement Through Architecture-Aware Strategies
  6. Precision and Performance Analysis of C Standard Math Library Functions on GPUs
  7. OpenMP Kernel Language Extensions for Performance Portable GPU Codes
  8. Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay
  9. Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution
  10. ORAQL — Optimistic Responses to Alias Queries in LLVM
  11. Implementing OpenMP’s SIMD Directive in LLVM’s GPU Runtime
  12. Exploring the Limits of Generic Code Execution on GPUs via Direct (OpenMP) Offload
  13. MARTINI: The Little Match and Replace Tool for Automatic Code Rewriting
  14. Remote OpenMP offloading
  15. OpenMP application experiences: Porting to accelerated nodes
  16. Concurrent Execution of Deferred OpenMP Target Tasks with Hidden Helper Threads
  17. Remote OpenMP Offloading
  18. MARTINI: The Little Match and Replace Tool for Automatic Application Rewriting with Code Examples
  19. Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading
  20. Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory
  21. Experience Report: Writing a Portable GPU Runtime with OpenMP 5.1
  22. Really Embedding Domain-Specific Languages into C++
  23. Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation
  24. FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis
  25. Compiler Optimizations for Parallel Programs
  26. Performance Exploration Through Optimistic Static Program Annotations
  27. The TRegion Interface and Compiler Optimizations for OpenMP Target Regions
  28. Polyhedral expression propagation
  29. Compiler Optimizations for OpenMP
  30. Optimistic loop optimization
  31. Input space splitting for OpenCL
  32. Runtime pointer disambiguation
  33. Runtime pointer disambiguation
  34. Generalized Task Parallelism
  35. Architecture-parametric timing analysis
  36. Impact of Resource Sharing on Performance and Performance Prediction: A Survey