In-SRAM Compute For Generative AI and Large Language Models

GSI Technology

Abstract

The recent uptick in generative artificial intelligence (GAI) has put the more pressure on hardware vendors to reduce the carbon footprint of running these power hungry large language models (LLM) in the datacenter. One way to accomplish a lower in-silicon power profile is to break the Von-Neumann bottleneck by tightly integrating traditional SRAM memory cells with interleaved programable processors in the same die. We report on our progress in this area, in particular, leveraging recent open research in both mixed precision mathematics and extreme low-bit quantization of deep learning model parameters and activations running in our custom "In-SRAM" processor.

Learning Objectives

Learn about the challenges of running generative AI and large language models in the datacenter.
Learn about a novel computer architecture, "In-SRAM" computing.
Learn about recent advances in new compressed data types suitable for large-scale deep learning models.

Download the Presentation

Related Sessions

AI / ML

An AI Inference Engine for Object Storage Systems

Object storage systems provide significant value for storing and managing data.

Dan Pollack

Data Storage Science LLC

Jessica Bresnahan
Data Storage Science LLC

Favorites

AI / ML

Applying AI/ML Methodologies to Categorize Storage Workloads and Replaying them in Standard Test Environments

With the complexity of applications increasing every day, the workloads generated by these applications are complicated and hard to replicate in test environments.

Dhishankar Sengupta

Hewlett Packard Enterprise (HPE)

Padmanabhan Pandurangan
Hewlett Packard Enterprise

Favorites