Writing a storage network server based on TCP sockets seems straightfoward - you call send() and recv() on the socket and parse the data. But the reality is not so simple. For example, most people know that sockets should be grouped together to make polling the set more efficient via epoll, kqueue, or io_uring. But did you know that it matters which sockets are grouped together, and that there's at least three different ways to group them? This talk will cover all of the best practices for how to write network stacks on Linux, FreeBSD, and Windows, with a focus specifically on protocols that look like NVMe-oF or iSCSI. We'll start with the basic system calls available through POSIX, the epoll and kqueue APIs on Linux and FreeBSD, briefly look at how Win32 APIs work and the undocumented AFD features, and spend a significant amount of time on io_uring. We'll cover the best ways to use non-blocking sockets and decide if we should speculatively attempt to receive extra data or not as we're parsing. We'll also cover the very latest features available on Linux such as zero copy transmit support and application device queues. We'll then look at a huge range of io_uring features such as FIXED_FILES, FIXED_BUFFERS, SENDRECV_POLL_FIRST, PROVIDE_BUFFERS, and MSG_WAITALL, and talk about just how an io_uring-based stack best operates. We'll close on some experimental features for the future, such as DMA-offload on receive operations and eBPF-based protocol parsing. The talk covers nearly a decade of tips, tricks, and design advice that will apply broadly to many storage-related protocols.
Network Stacks for Storage Developers: A survey of all of the tricks to make your network stack fly
Fri Sep 30 | 12:50pm
Abstract
Learning Objectives
- How to best design a storage-centric network stack
- Catch up on all of the latest networking features
- Learn what new networking features are coming in the futureĀ
---
Benjamin Walker
Intel
Related Sessions