All Stories

  1. Affinity-Based Thread and Data Mapping in Shared Memory Systems
  2. Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems
  3. Data mining the memory access stream to detect anomalous application behavior
  4. HPC Application Performance and Cost Efficiency in the Cloud
  5. Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications
  6. Optimizing memory affinity with a hybrid compiler/OS approach
  7. Performance Evaluation of Multiple Cloud Data Centers Allocations for HPC
  8. Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures
  9. Kernel-Based Thread and Data Mapping for Improved Memory Affinity
  10. A dynamic block-level execution profiler
  11. LAPT: A locality-aware page table for thread and data mapping
  12. Automatic Communication Optimization of Parallel Applications in Public Clouds
  13. Modeling memory access behavior for data mapping
  14. Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines
  15. Communication in Shared Memory: Concepts, Definitions, and Efficient Detection
  16. Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency
  17. A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures
  18. Opportunities and Challenges of Performing Vector Operations inside the DRAM
  19. Saving memory movements through vector processing in the DRAM
  20. SiNUCA: A Validated Micro-Architecture Simulator
  21. Characterizing communication and page usage of parallel applications for thread and data mapping
  22. Reconfigurable Vector Extensions inside the DRAM
  23. Communication-aware thread mapping using the translation lookaside buffer
  24. Partial coscheduling of virtual machines based on memory access patterns
  25. An Efficient Algorithm for Communication-Based Task Mapping
  26. Communication-aware process and thread mapping using online communication detection
  27. Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems
  28. Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems
  29. TABARNAC
  30. Optimizing Memory Locality Using a Locality-Aware Page Table
  31. Profiling and Reducing Micro-Architecture Bottlenecks at the Hardware Level
  32. Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols
  33. kMAF
  34. Energy Efficient Last Level Caches via Last Read/Write Prediction
  35. Communication-Based Mapping Using Shared Pages
  36. Analyzing resource interdependencies in multi-core architectures to improve scheduling decisions
  37. High Performance Computing in the cloud: Deployment, performance and cost efficiency
  38. Evaluating High Performance Computing on the Windows Azure Platform
  39. Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory
  40. Trace-Based Visualization as a Tool to Understand Applications' I/O Performance in Multi-core Machines
  41. Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors