What is the Role of Flash in Data Storage Ingestion Within the AI Pipeline?

Wed Sep 18 | 11:35am

Location:

Cypress

Abstract

HDDs have been the traditional hardware infrastructure for object stores like S3, Google Cloud and Azure Blob in data lakes .But as AI solution deployment transitions to production scale in organizations (Meta's Tectonic-Shift platform being a good example), it begins to impose demands on the data storage ingestion pipeline which have not been seen before. With Deep Learning Recommendation Model (DLRM) training as an AI use case, we first introduce the challenges object stores can expect to face as AI deployments scale. These include the growth in the scale of available data, the growth of faster training GPUs, and the growth in AI/ML ops deployment. We then explain how flash storage is well positioned to meet the needs of bandwidth and power that these systems require. We will share key observations from storage trace analysis of a few MLPerf DLRM preprocessing and training captures. We will conclude with a call to action for more work on standardizing benchmarks to characterize data ingestion performance and power efficiency.

Learning Objectives

Understand the role of data ingestion in the AI pipeline
Describe how has AI deployment at scale changed and what is expected of object stores?
Understand how flash storage can contribute to address this problem and what more is needed?

Download the Presentation

---

Suresh Rajgopal

Micron Technology

Sundararajan Sankaranarayanan
Micron Technology
Sujit Somandepalli
Micron Technology

Related Sessions

AI / ML Infrastructure

Storage for AI 101 - A Primer on AI Workloads and Their Storage Requirements

The SNIA TC AI Taskforce is working on a paper on AI workloads and the storage requirements for those workloads.

Curtis Ballard

HPE

Craig Carlson
AMD

Favorites

AI / ML Infrastructure

Optimizing HDD Interface in the Generative AI Era

Citigroup Inc. analysts quote, "Enterprise data is expected to continue to grow at over 40% CAGR as AI becomes an incremental driver for data creation, storage, and data management."

Mohamad EL-Batal

Seagate

Favorites

AI / ML Infrastructure

Supercharging OpenAI Training with Microsoft's Azure Blob Storage

Join us for an in-depth exploration of how Azure Blob Storage (Azure's object storage service) has innovated and scaled to meet the demands of supercomputer AI training efforts.

Jason Vallery

Microsoft

Jegan Devaraju
Microsoft

Favorites

AI / ML Infrastructure

Accelerating GPU Server Access to Network-Attached Disaggregated Storage using Data Processing Unit (DPU)

The recent AI explosion is reshaping storage architectures in data centers, where GPU servers increasingly need to access vast amounts of data on network-attached disaggregated storage servers for

Eriko Nurvitadhi

MangoBoost, Inc.

Craig Carlson
AMD

Favorites

Main menu

You are here

What is the Role of Flash in Data Storage Ingestion Within the AI Pipeline?