Storage Acceleration via Decoupled SDS Architecture

Abstract

Software Defined Storage (SDS) frameworks offer storage services that provide high availability, durability and reliability guarantees. However, these guarantees come at a performance cost. While drives can offer microsecond latency and throughput of millions of IOPs, SDS services typically offer millisecond access latencies with 10-100K IOPs throughput.

We present an architecture that decouples data read paths to bypass the bottleneck SDS stack for these paths. The SDS stack shares logical-to-physical storage mappings with a companion module. The module can cache these mappings which enables creation of direct I/O paths to the drives under SDS control. We also describe mechanisms whereby the SDS stack can forcibly evict the shared storage mappings as needed.

We share our experience implementing a proof-of-concept of this approach in Openstack Ceph. The PoC uses SPDK to create direct NVMe-over Fabrics I/O paths into Ceph logical volumes (aka Rados Block Device (RBD) images). Our microbenchmark results with this PoC demonstrate an order-of-magnitude read latency reduction and a 4-10X increase in throughput. Finally, we discuss future opportunities in reducing CPU usage, utilizing hardware assists like IPU/DPU/sNIC, and workload specific optimizations for the fast read paths into SDS.

Learning Objectives

Understand I/O performance bottlenecks in SDS architecture
List challenges in extending NVMe-oF to distributed storage
Describe benefits of user mode storage stacks like SPDK
Analyze alternate architectural choices for I/O path optimization
Create experiments to accurately measure storage I/O performance

Related Sessions