Computational Storage Service: Analytical Data Platform with Intelligent Analytics Offload

Mon Sep 16 | 11:35am
Location:
Lafayette/San Tomas
Abstract

As industries grapple with an ever-expanding and complex sea of data, there is a paramount need for rethinking storage and analytics. For decades, the industry has tried to push analytics closer to the data to accelerate analytics and reduce costs – with varying degrees of success. Data warehouses can accelerate repeated queries on known relevant data, but require significant copying and pre-processing of data, and get very expensive as datasets grow. New “data lakehouse” approaches aim to directly query raw data objects in storage, but introduce new data formats, metadata catalogs, and additional compute resources. At AirMettle, we approach this differently – leveraging existing data formats, storage server infrastructure, and even commodity SSDs to dramatically accelerate analytics, lower costs, and reduce power consumption. AirMettle’s Analytical Data Platform is a trailblazing software-defined object storage service that accelerates Big Data analytics operations up to 100 times compared to conventional methods.

In this talk, we will discuss the nature of data, how this influenced the development of our Analytical Data Platform, and how its internal architecture stores and processes semi-structured data. Use cases and performance results will be showcased, bearing testimony to the groundbreaking speed and agility that the platform brings to the analytics landscape. Furthermore, we will share insights into our vision and progress in delegating processing to commodity storage devices.

Learning Objectives

Describe why existing storage services can provide parallel I/O but not analytics.
Describe the key challenges which have prevented storage devices from providing computational acceleration
Describe the flexibility “serverless” APIs can unlock in enabling access to scalable in-storage analytics
Describe the inherent performance advantages in “serverless” in-storage analytics

Abstract

As industries grapple with an ever-expanding and complex sea of data, there is a paramount need for rethinking storage and analytics. For decades, the industry has tried to push analytics closer to the data to accelerate analytics and reduce costs – with varying degrees of success. Data warehouses can accelerate repeated queries on known relevant data, but require significant copying and pre-processing of data, and get very expensive as datasets grow. New “data lakehouse” approaches aim to directly query raw data objects in storage, but introduce new data formats, metadata catalogs, and additional compute resources. At AirMettle, we approach this differently – leveraging existing data formats, storage server infrastructure, and even commodity SSDs to dramatically accelerate analytics, lower costs, and reduce power consumption. AirMettle’s Analytical Data Platform is a trailblazing software-defined object storage service that accelerates Big Data analytics operations up to 100 times compared to conventional methods.

In this talk, we will discuss the nature of data, how this influenced the development of our Analytical Data Platform, and how its internal architecture stores and processes semi-structured data. Use cases and performance results will be showcased, bearing testimony to the groundbreaking speed and agility that the platform brings to the analytics landscape. Furthermore, we will share insights into our vision and progress in delegating processing to commodity storage devices.

Learning Objectives

Describe why existing storage services can provide parallel I/O but not analytics.
Describe the key challenges which have prevented storage devices from providing computational acceleration
Describe the flexibility “serverless” APIs can unlock in enabling access to scalable in-storage analytics
Describe the inherent performance advantages in “serverless” in-storage analytics


---

Donpaul Stephens
AirMettle, Inc.
Related Sessions