In-SRAM Compute For Generative AI and Large Language Models

Wed Sep 20 | 4:35pm
Location:
Salon IV
Abstract

The recent uptick in generative artificial intelligence (GAI) has put the more pressure on hardware vendors to reduce the carbon footprint of running these power hungry large language models (LLM) in the datacenter. One way to accomplish a lower in-silicon power profile is to break the Von-Neumann bottleneck by tightly integrating traditional SRAM memory cells with interleaved programable processors in the same die. We report on our progress in this area, in particular, leveraging recent open research in both mixed precision mathematics and extreme low-bit quantization of deep learning model parameters and activations running in our custom "In-SRAM" processor.

Learning Objectives

  • Learn about the challenges of running generative AI and large language models in the datacenter.
  • Learn about a novel computer architecture, "In-SRAM" computing.
  • Learn about recent advances in new compressed data types suitable for large-scale deep learning models.

---

George Williams
GSI Technology
Related Sessions