All Stories

  1. Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators
  2. GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure
  3. Using Additive Modifications in LU Factorization Instead of Pivoting
  4. A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
  5. Using Advanced Vector Extensions AVX-512 for MPI Reductions
  6. Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications
  7. Load-balancing Sparse Matrix Vector Product Kernels on GPUs
  8. Guest editors’ note: Special issue on clusters, clouds, and data for scientific computing
  9. Massively Parallel Automated Software Tuning
  10. PLASMA
  11. Big data and extreme-scale computing
  12. A look back on 30 years of the Gordon Bell Prize
  13. GPU-accelerated co-design of induced dimension reduction
  14. Exascale computing and big data