HPC architectures increasingly handle workloads where the working data set cannot be easily partitioned or is too large to fit into node local memory. We have defined a system architecture and a software stack to enable large data sets to be held in fabric-attached memory (FAM) that is accessible to all compute nodes across a Slingshot-connected HPC cluster, thus providing a new approach to handling large data sets. Emerging AI and data analytics workloads are increasingly becoming important for HPC architectures because HPC clusters provide computation capabilities needed at scale; however a divide still exists between traditional HPC, AI, and data analytics applications, because the three communities use very different programming models. The architecture leverages emerging hardware capabilities such as CXL along with ideas from both HPC and high performance data analytics software to support AI and data analytics on HPC clusters. This presentation will cover the architecture, the software stack and its value using a use case: an Arkouda-based proxy application for real-time data analytics.
Fabric Attached Memory – Hardware and Software Architecture
Thu Sep 21 | 8:15am
Location:
Salon IV
Abstract
Learning Objectives
- Fabric Attached Memory and its benefits to distributed HPC applications.
- Building a system with Fabric Attached Memory and the Software stack required to support Fabric Attached Memory.
- Example application uses case for Fabric Attached Memory.
---
David Emberson
Hewlett Packard Enterprise
- Sharad SinghalHewlett Packard Enterprise
- Clarete CrastaHewlett Packard Enterprise
Related Sessions