With the increased business value that AI enabled applications can unlock, there is a need to support Gen AI models at varying degrees of scale - from foundational model training in data centers to inference deployment on edge and mobile devices. Flash Storage and PCIe/NVMe storage in particular, can play an important role in enabling this with their density and cost benefits. Enabling NVMe offload for Gen AI requires a combination of careful ML model design and its effective deployment on a memory-flash storage tier. Using inference as an example, with the Microsoft Deep Speed library, we highlight the benefits of NVMe offload and call out specific optimizations and improvements that NVMe storage can target to demonstrate improved LLM inference metrics
What can Storage do for AI?
Tue Sep 17 | 2:00pm
Location:
Cypress
Abstract
Learning Objectives
Recognize the need to democratize training and inference at scale
Understand what does enabling NVMe offload of LLMs require
Be aware of opportunities for NVMe flash to enable improved LLM inference performance
---
Suresh Rajgopal
Micron Technology
- Sujit SomandepalliMicron Technology
- Katya GianniosMicron Technology
Related Sessions