All Stories

  1. Fine-grained Policy-driven I/O Sharing for Burst Buffers
  2. Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods
  3. EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management
  4. Realizing the Vision of CFD in 2030
  5. Succeeding Together
  6. Performance Portability for Advanced Architectures
  7. Translational research in the MPICH project
  8. Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure
  9. HAL: Computer System for Scalable Deep Learning
  10. Convergence of Artificial Intelligence and High Performance Computing on NSF-supported Cyberinfrastructure
  11. Node-Aware Improvements to Allreduce
  12. Enabling real-time multi-messenger astrophysics discoveries with deep learning
  13. Node aware sparse matrix–vector multiplication
  14. Guest editor's introduction: Special issue on best papers from EuroMPI/USA 2017
  15. Using Node Information to Implement MPI Cartesian Topologies
  16. Big data and extreme-scale computing
  17. The Blue Waters Super-System for Super-Science
  18. Final report for “Extreme-scale Algorithms and Solver Resilience”
  19. Key Value Stores in HPC
  20. Towards millions of communicating threads
  21. Modeling MPI Communication Performance on SMP Nodes
  22. Rethinking High Performance Computing System Architecture for Scientific Big Data Applications
  23. An implementation and evaluation of the MPI 3.0 one‐sided communication interface
  24. Final report: Compiled MPI. Cost-Effective Exascale Application Development
  25. Efficient disk-to-disk sorting
  26. Message Passing Interface
  27. Remote Memory Access Programming in MPI-3
  28. A study of file system read and write behavior on supercomputers
  29. Runtime Support for Irregular Computation in MPI-Based Applications
  30. Algebraic Multigrid on a Dragonfly Network: First Experiences on a Cray XC30
  31. Rethinking Key-Value Store for Parallel I/O Optimization
  32. Nonblocking Epochs in MPI One-Sided Communication
  33. Decoupled I/O for Data-Intensive High Performance Computing
  34. Enabling the environmentally clean air transportation of the future: a vision of computational fluid dynamics in 2030
  35. Special Issue: SC13 – The International Conference for High Performance Computing, Networking, Storage and Analysis
  36. MPI-Interoperable Generalized Active Messages
  37. Optimization Strategies for MPI-Interoperable Active Messages
  38. Programming for Exascale Computers
  39. Analysis of topology-dependent MPI performance on Gemini networks
  40. Runtime system design of decoupled execution paradigm for data-intensive high-end computing
  41. MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
  42. Toward Asynchronous and MPI-Interoperable Active Messages
  43. Performance Analysis of the Lattice Boltzmann Model Beyond Navier-Stokes
  44. Systematic Reduction of Data Movement in Algebraic Multigrid Solvers
  45. Multiphysics simulations
  46. Applications of the streamed storage format for sparse matrix operations
  47. Parallel Adaptive Deflated GMRES
  48. A Case for Optimistic Coordination in HPC Storage Systems
  49. Performance Modeling of Algebraic Multigrid on Blue Gene/Q: Lessons Learned
  50. A Decoupled Execution Paradigm for Data-Intensive High-End Computing
  51. Modeling the Performance of an Algebraic Multigrid Cycle Using Hybrid MPI/OpenMP
  52. Best algorithms + best computers = powerful match
  53. Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization
  54. Faster topology-aware collective algorithms through non-minimal communication
  55. Faster topology-aware collective algorithms through non-minimal communication
  56. Adaptive Strategy for One-Sided Communication in MPICH2
  57. Efficient Multithreaded Context ID Allocation in MPI
  58. Leveraging MPI’s One-Sided Communication Interface for Shared-Memory Programming
  59. MPI 3 and Beyond: Why MPI Is Successful and What Challenges It Faces
  60. Formal analysis of MPI-based parallel programs
  61. Weighted locality-sensitive scheduling for mitigating noise on multi-core clusters
  62. Avoiding hot-spots on two-level direct networks
  63. Performance modeling for systematic performance tuning
  64. Modeling the performance of an algebraic multigrid cycle on HPC platforms
  65. LACIO: A New Collective I/O Strategy for Parallel I/O Systems
  66. Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes
  67. MPI ON MILLIONS OF CORES
  68. EcoG: A Power-Efficient GPU Cluster Architecture for Scientific Computing
  69. The International Exascale Software Project roadmap
  70. Multi-core and Network Aware MPI Topology Functions
  71. Performance Expectations and Guidelines for MPI Derived Datatypes
  72. Scalable Memory Use in MPI: A Case Study with MPICH2
  73. Minimizing MPI Resource Contention in Multithreaded Multicore Environments
  74. P2S2 2010: Third International Workshop on Parallel Programming Models and Systems Software for High-End Computing
  75. Optimizing Sparse Data Structures for Matrix-vector Multiply
  76. Self-Consistent MPI Performance Guidelines
  77. Erratum
  78. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
  79. The Importance of Non-Data-Communication Overheads in MPI
  80. A Pipelined Algorithm for Large, Irregular All-Gather Problems
  81. The Importance of Non-Data-Communication Overheads in MPI
  82. An adaptive performance modeling tool for GPU architectures
  83. An adaptive performance modeling tool for GPU architectures
  84. A Scalable MPI_Comm_split Algorithm for Exascale Computing
  85. An introductory exascale feasibility study for FFTs and multigrid
  86. Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems
  87. Load Balancing for Regular Meshes on SMPs with MPI
  88. PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems
  89. Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues
  90. Formal methods applied to high‐performance computing software design: a case study of MPI one‐sided communication‐based locking
  91. Test suite for evaluating performance of multithreaded MPI communication
  92. On the Need for a Consortium of Capability Centers
  93. Toward Exascale Resilience
  94. Investigating High Performance RMA Interfaces for the MPI-3 Standard
  95. Software for Petascale Computing Systems
  96. Toward message passing for a million processes: characterizing MPI on a massive scale blue gene/P
  97. MPI on a Million Processors
  98. Processing MPI Datatypes Outside MPI
  99. Natively Supporting True One-Sided Communication in  MPI on Multi-core Systems with InfiniBand
  100. Hiding I/O latency with pre-execution prefetching for parallel applications
  101. Parallel I/O prefetching using MPI file caching and I/O signatures
  102. Exploring Parallel I/O Concurrency with Speculative Prefetching
  103. Applied Mathematics at the U.S. Department of Energy: Past, Present and a View to the Future
  104. An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files
  105. Non-data-communication Overheads in MPI: Analysis on Blue Gene/P
  106. A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems
  107. A Formal Approach to Detect Functionally Irrelevant Barriers in MPI Programs
  108. Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems
  109. Implementing Efficient Dynamic Formal Verification Methods for MPI Programs
  110. Improving the Performance of Tensor Matrix Vector Multiplication in Cumulative Reaction Probability Based Quantum Chemistry Codes
  111. Self-consistent MPI-IO Performance Requirements and Expectations
  112. Toward Efficient Support for Multithreaded MPI Communication
  113. Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP
  114. Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand
  115. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem
  116. Thread-safety in an MPI implementation: Requirements and analysis
  117. A Portable Method for Finding User Errors in the Usage of MPI Collective Operations
  118. Electron injection by a nanowire in the bubble regime
  119. MPI - Eine Einführung
  120. Nonuniformly Communicating Noncontiguous Data: A Case Study with PETSc and MPI
  121. Self-consistent MPI Performance Requirements
  122. Collective communication on architectures that support simultaneous communication over multiple links
  123. An Interface to Support the Identification of Dynamic MPI 2 Processes for Scalable Parallel Debugging
  124. Automatic Memory Optimizations for Improving MPI Derived Datatype Performance
  125. S01---Advanced MPI
  126. M01---Application supercomputing and multiscale simulation techniques
  127. Awards & video---Awards session
  128. Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem
  129. Formal Verification of Programs That Use MPI One-Sided Communication
  130. Issues in Developing a Thread-Safe MPI Implementation
  131. Message from the Program Chair
  132. Multi-core issues---Multi-Core for HPC
  133. Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication
  134. Optimization of Collective Communication Operations in MPICH
  135. Using MPI-2: A Problem-Based Approach
  136. Collective Error Detection for MPI Collective Operations
  137. An Evaluation of Implementation Options for MPI One-Sided Communication
  138. Designing a Common Communication Subsystem
  139. Implementing MPI-IO atomic mode without file system support
  140. Towards a Productive MPI Environment
  141. Fault Tolerance in Message Passing Interface Programs
  142. Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters
  143. Minimizing Synchronization Overhead in the Implementation of MPI One-Sided Communication
  144. High performance MPI-2 one-sided communication over InfiniBand
  145. Parallel netCDF
  146. Integrated Network Management VIII
  147. Improving the performance of MPI derived datatypes by optimizing memory-access cost
  148. Efficient structured data access in parallel file systems
  149. Improving the Performance of Collective Operations in MPICH
  150. Noncontiguous I/O accesses through MPI-IO
  151. Optimizing noncontiguous accesses in MPI–IO
  152. MPI on the Grid
  153. Parallel Programming with MPI
  154. Parallel Programming with MPI
  155. Components and interfaces of a process management system for parallel programs
  156. High-performance parallel implicit CFD
  157. Scalable Unix Commands for Parallel Processors: A High-Performance Implementation
  158. Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD
  159. A Scalable Process-Management Environment for Parallel Programs
  160. Analyzing the Parallel Scalability of an Implicit Unstructured Mesh CFD Code
  161. From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems
  162. Performance Modeling and Tuning of an Unstructured Mesh CFD Application
  163. Towards Realistic Performance Bounds for Implicit CFD Codes
  164. Toward Scalable Performance Visualization with Jumpshot
  165. Parallel computation of three-dimensional nonlinear magnetostatic problems
  166. On implementing MPI-IO portably and with high performance
  167. Using MPI-2
  168. Using MPI
  169. Achieving high sustained performance in an unstructured mesh CFD application
  170. Data sieving and collective I/O in ROMIO
  171. I/O in Parallel Applications: the Weakest Link
  172. Parallel Newton--Krylov--Schwarz Algorithms for the Transonic Full Potential Equation
  173. A Case for Using MPI's Derived Datatypes to Improve I/O Performance
  174. MPI - The Complete Reference
  175. Parallel Implicit PDE Computations
  176. Sowing Mpich: a Case Study in the Dissemination of a Portable Environment for Parallel Scientific Computing
  177. A high-performance MPI implementation on a shared-memory vector supercomputer
  178. Why are PVM and MPI so different?
  179. A high-performance, portable implementation of the MPI message passing interface standard
  180. I/O characterization of a portable astrophysics application on the IBM SP and Intel Paragon
  181. Numerical Simulation of Vortex Dynamics in Type-II Superconductors
  182. An experimental evaluation of the parallel I/O systems of the IBM SP and Intel Paragon using a production application
  183. Early Applications in the Message-Passing Interface (Mpi)
  184. Experiences with the IBM SP1
  185. Solution of dense systems of linear equations arising from integral-equation formulations
  186. Users guide for the ANL IBM SPx
  187. A comparison of some domain decomposition and ILU preconditioned iterative methods for nonsymmetric elliptic problems
  188. Newton-Krylov-Schwarz Methods in CFD
  189. Parallel implicit methods for aerodynamics
  190. Early experiences with the IBM SP-1
  191. Users manual for the Chameleon parallel programming tools
  192. A test implementation of the MPI draft message-passing standard
  193. Convergence rate estimate for a domain decomposition method
  194. Parallel Performance of Domain-Decomposed Preconditioned Krylov Methods for PDEswith Locally Uniform Refinement
  195. Domain decomposition techniques for the parallel solution of nonsymmetric systems of elliptic boundary value problems
  196. Krylov methods preconditioned with incompletely factored matrices on the CM-2
  197. A parallel version of the fast multipole method
  198. Computational fluid dynamics on parallel processors
  199. Domain decomposition on parallel computers
  200. Recursive mesh refinement on hypercubes
  201. A Parallel Version of the Fast Multipole Method
  202. Recursive Mesh Refinement on Hypercubes
  203. Complexity of Parallel Implementation of Domain Decomposition Techniques for Elliptic Partial Differential Equations
  204. Local uniform mesh refinement on loosely-coupled parallel processors
  205. Solving PDEs on loosely-coupled parallel processors
  206. A Comparison of Domain Decomposition Techniques for Elliptic Partial Differential Equations and their Parallel Implementation
  207. Local Uniform Mesh Refinement on Vector and Parallel Processors
  208. Predicting memory-access cost based on data-access patterns
  209. Grid-based Image Registration
  210. Observations on WoCo9
  211. Data Transfers between Processes in an SMP System: Performance Study and Application to MPI
  212. High Performance File I/O for The Blue Gene/L Supercomputer
  213. A taxonomy of programming models for symmetric multiprocessors and SMP clusters
  214. An abstract-device interface for implementing portable parallel-I/O interfaces
  215. Developing Applications for a Heterogeneous Computing Environment
  216. Dynamic process management in an MPI setting
  217. Goals guiding design: PVM and MPI
  218. Open Issues in MPI Implementation
  219. Practical Model-Checking Method for Verifying Correctness of MPI Programs
  220. Revealing the Performance of MPI RMA Implementations
  221. Scalable Unix tools on parallel processors
  222. The MPI communication library: its design and a portable implementation