Moffett Antoum AI Inference Accelerator at Hot Chips 2023

0
Moffett Antoum AI Inference Accelerator _Page_18
Moffett Antoum AI Inference Accelerator _Page_18

Moffett AI showed off its new AI inference SoC at Hot Chips 2023. We have previously seen Moffett in MLPerf Inference v3.0 and previous generations.

Please excuse typos, this is being written live at the conference.

Moffett Antoum AI Inference Accelerator at Hot Chips 2023

Moffett says that lately, we are seeing many more large language models while the image or computer vision module explosion happened years ago.

Moffett Antoum AI Inference Accelerator _Page_03
Moffett Antoum AI Inference Accelerator _Page_03

These models have different training data set sizes. Here is an example of the differences. Usually, computer vision models are very small while the input sizes are very large. Language models are very different.

Moffett Antoum AI Inference Accelerator _Page_04
Moffett Antoum AI Inference Accelerator _Page_04

The Moffett Antoum is designed for sparsity in models both vision and language models.

Moffett Antoum AI Inference Accelerator _Page_05
Moffett Antoum AI Inference Accelerator _Page_05

The quad core CPU can run Linux.

Sparsity exists in tensor algebra because zeros occur naturally in the operation.

Moffett Antoum AI Inference Accelerator _Page_07
Moffett Antoum AI Inference Accelerator _Page_07

Sparsity can happen in weight sparsity that is well-known. There are also opportunities to exploit sparsity in the input data, conditional sparsity, and more.

Moffett Antoum AI Inference Accelerator _Page_08
Moffett Antoum AI Inference Accelerator _Page_08

Moffett is using a compile aware simulator to see how to map models and sparsity to its accelerators.

Moffett Antoum AI Inference Accelerator _Page_09
Moffett Antoum AI Inference Accelerator _Page_09

This is the SoC architecture. Since it is using sparsity, it can have slower external interfaces like PCIe Gen3.

Moffett Antoum AI Inference Accelerator _Page_11
Moffett Antoum AI Inference Accelerator _Page_11

Going into some of these blocks will be the next.

Within the NNCore subsystem, of which there are four on the chip, there is a SPU. SPU stands for sparse processing unit.

Moffett Antoum AI Inference Accelerator _Page_12
Moffett Antoum AI Inference Accelerator _Page_12

Here is the SPU clusters datapath. This one is a fairly complex and dense slide that I am not going to have time to transcribe.

Moffett Antoum AI Inference Accelerator _Page_13
Moffett Antoum AI Inference Accelerator _Page_13

The NNCore also has a custom Vector Processing Unit or VPU to do vector processing and can handle INT8 and FP16 data types.

Moffett Antoum AI Inference Accelerator _Page_14
Moffett Antoum AI Inference Accelerator _Page_14

There are also domain-specific accelerators.

Moffett Antoum AI Inference Accelerator _Page_15
Moffett Antoum AI Inference Accelerator _Page_15

There is a core-to-core interconnect to move data around and do things like share caches.

Moffett Antoum AI Inference Accelerator _Page_16
Moffett Antoum AI Inference Accelerator _Page_16

The hybrid sparsity helps accelerate sparse LLM’s.

Moffett Antoum AI Inference Accelerator _Page_17
Moffett Antoum AI Inference Accelerator _Page_17

Here are the key specs, including a 70W TDP and 800MHz frequency.

Moffett Antoum AI Inference Accelerator _Page_18
Moffett Antoum AI Inference Accelerator _Page_18

Here are the SparseOne AI inference cards.

Moffett Antoum AI Inference Accelerator _Page_19
Moffett Antoum AI Inference Accelerator _Page_19

Moffett has SparseOne toolchain to run models on its cards.

Moffett Antoum AI Inference Accelerator _Page_21
Moffett Antoum AI Inference Accelerator _Page_21

Here is a bit more about the toolchain. It seems like Moffett is using half of its talk or so on software which is usually a good sign.

Moffett Antoum AI Inference Accelerator _Page_22
Moffett Antoum AI Inference Accelerator _Page_22

Sparsity can be traded for throughput, and also accuracy for throughput. The benchmarks seem to show that the Moffett S4 is faster than theĀ NVIDIA Tesla T4.

Moffett Antoum AI Inference Accelerator _Page_24
Moffett Antoum AI Inference Accelerator _Page_24

This is Moffett’s multi-card solution.

Moffett Antoum AI Inference Accelerator _Page_25
Moffett Antoum AI Inference Accelerator _Page_25

Here is the performance of 8x Moffett S30 cards.

Moffett Antoum AI Inference Accelerator _Page_26
Moffett Antoum AI Inference Accelerator _Page_26

Moffett cannot use sparsity in the MLPerf closed division, so it has to be submitted in the open category.

Moffett Antoum AI Inference Accelerator _Page_27
Moffett Antoum AI Inference Accelerator _Page_27

This was the demo.

Moffett Antoum AI Inference Accelerator _Page_28
Moffett Antoum AI Inference Accelerator _Page_28

Here is the super-resolution increase from 9fps to 59fps in this demo at similar quality.

Moffett Antoum AI Inference Accelerator _Page_29
Moffett Antoum AI Inference Accelerator _Page_29

There is another demo.

Moffett Antoum AI Inference Accelerator _Page_30
Moffett Antoum AI Inference Accelerator _Page_30

And another one.

Moffett Antoum AI Inference Accelerator _Page_31
Moffett Antoum AI Inference Accelerator _Page_31

Here is the summary wall of text.

Moffett Antoum AI Inference Accelerator _Page_32
Moffett Antoum AI Inference Accelerator _Page_32

Final Words

Overall these are cool accelerators. The bigger question is who are the customers for its AI parts today, and who will be the customers in the future. Cerebras, for example, has announced ~$1B in deal value over the next 18 months.

Still, it will be interesting to see how this develops.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.