Benchmarking Storage with AI Workloads

Modern data centers invariably face performance challenges due to the rising volume of datasets and complexity of deep learning workloads. Sizeable research and development has taken place to understand AI/ML workloads. These workloads are computationally intensive, but also require vast amounts of data to train models and draw inferences. The impact of storage on AI/ML pipelines therefore merits additional study. This work addresses the following research questions: (1) whether AI/ML workloads benefit from high performance storage systems, and (2) can we showcase such storage via realistic approaches, with vision-based training and inference workloads. The study evaluated the following storage-intensifying approaches: limiting system memory, simultaneous data ingestion and training, running parallelized training workloads, and inferring streaming AI workloads. Simultaneous data ingestion and training, and inference are observed to be the most storage intensive and thus are recommended as ways to showcase storage. To support the analyses, we discuss the system resources taxed by AI workloads. Additionally, the work presents I/O analyses for these approaches in terms of access locality, I/O sizes, read write ratio, and file offset patterns. The I/O traces of inference indicated remarkably diverse random write request size distribution. PM9A3 could support such a challenging workload generating 25x I/O with 3.4% overhead on inference time and 3x higher throughput compared to the MLPerf inference implementation.

Abstract

Modern data centers invariably face performance challenges due to the rising volume of datasets and complexity of deep learning workloads. Sizeable research and development has taken place to understand AI/ML workloads. These workloads are computationally intensive, but also require vast amounts of data to train models and draw inferences. The impact of storage on AI/ML pipelines therefore merits additional study. This work addresses the following research questions: (1) whether AI/ML workloads benefit from high performance storage systems, and (2) can we showcase such storage via realistic approaches, with vision-based training and inference workloads. The study evaluated the following storage-intensifying approaches: limiting system memory, simultaneous data ingestion and training, running parallelized training workloads, and inferring streaming AI workloads. Simultaneous data ingestion and training, and inference are observed to be the most storage intensive and thus are recommended as ways to showcase storage. To support the analyses, we discuss the system resources taxed by AI workloads. Additionally, the work presents I/O analyses for these approaches in terms of access locality, I/O sizes, read write ratio, and file offset patterns. The I/O traces of inference indicated remarkably diverse random write request size distribution. PM9A3 could support such a challenging workload generating 25x I/O with 3.4% overhead on inference time and 3x higher throughput compared to the MLPerf inference implementation.

Learning Objectives

  • Showcasing storage under real world workloads
  • Understanding implications of high-performance storage on AI workloads performance
  • I/O analysis to guide the design and configuration of storage systems to meet the unique challenges posed by AI/ML workloads in terms of randomness, locality of reference, and I/O size distribution

Related Sessions