Integrating S3 into Distributed, Multi-protocol Hyperscale NAS

Wed Sep 18 | 10:35am
Location:
Stevens Creek
Abstract

The performance requirements needed to power GPU-based computing use cases for AI/DL and other high-performance workflows are challenged by the performance limitations of legacy file and object storage systems. Typically, such use cases have needed to deploy parallel file systems such as Lustre or others, which require networking and skillsets not typically available in standard Enterprise data centers.

Standards-based parallel file systems such as pNFS v4.2 provide the high-performance needed for such work loads, and do so with commodity hardware, standard Ethernet infrastructure. They also provide the multi-protocol file and object access not typically supported by HPC parallel file systems. PNFS v4.2 architectures used in this way are often called Hyperscale NAS, since they merge very high throughput parallel file system performance with the standard capabilities of enterprise NAS solutions. It is this architecture that is deployed at Meta to feed 24,000 GPUs in its AI Research SuperCluster at 12.5TB per second on commodity hardware and standard Ethernet to power its Llama 2 & 3 large language models (LLMs).

But AI/DL data sets are often distributed across multiple incompatible storage types in one or more locations, including S3 storage at edge locations. Traditionally, to pull S3 data from the edge into such workflows has required deployment of file gateways or other methods to bridge protocols.

This session will look at an architecture that enables data on S3 storage to be automatically integrated into a multi-platform, multi-protocol, multi-site Hyperscale NAS environment seamlessly. By leveraging real-world implementations, the session will highlight how this standards-based approach can enable organizations to leverage conventional enterprise infrastructure with data in place on existing storage of any type to feed GPU-based AI and other high-performance workflows.

Learning Objectives

Learn how S3 data silos can be seamlessly integrated into high-performance multi-protocol parallel file system workflows, such as are needed to power GPU computing for AI/DL and other high-performance use cases.
Understand how distributed data sources can be consolidated for high-performance use cases with data in place, without needing to copy data into a proprietary and often siloed new data repository.
Learn best practices for utilizing commodity hardware, standard networking, and existing multi-vendor, multi-protocol and often distributed storage resources into an integrated, vendor neutral environment capable of powering high-performance use cases.


---

Alan Wright
Hammerspace
Related Sessions