Moffett Antoum AI Inference Accelerator at Hot Chips 2023

August 30, 2023

Moffett Antoum AI Inference Accelerator _Page_18

Moffett AI showed off its new AI inference SoC at Hot Chips 2023. We have previously seen Moffett in MLPerf Inference v3.0 and previous generations.

Please excuse typos, this is being written live at the conference.

Moffett Antoum AI Inference Accelerator at Hot Chips 2023

Moffett says that lately, we are seeing many more large language models while the image or computer vision module explosion happened years ago.

Moffett Antoum AI Inference Accelerator _Page_03

These models have different training data set sizes. Here is an example of the differences. Usually, computer vision models are very small while the input sizes are very large. Language models are very different.

Moffett Antoum AI Inference Accelerator _Page_04

The Moffett Antoum is designed for sparsity in models both vision and language models.

Moffett Antoum AI Inference Accelerator _Page_05

The quad core CPU can run Linux.

Sparsity exists in tensor algebra because zeros occur naturally in the operation.

Moffett Antoum AI Inference Accelerator _Page_07

Sparsity can happen in weight sparsity that is well-known. There are also opportunities to exploit sparsity in the input data, conditional sparsity, and more.

Moffett Antoum AI Inference Accelerator _Page_08

Moffett is using a compile aware simulator to see how to map models and sparsity to its accelerators.

Moffett Antoum AI Inference Accelerator _Page_09

This is the SoC architecture. Since it is using sparsity, it can have slower external interfaces like PCIe Gen3.

Moffett Antoum AI Inference Accelerator _Page_11

Going into some of these blocks will be the next.

Within the NNCore subsystem, of which there are four on the chip, there is a SPU. SPU stands for sparse processing unit.

Moffett Antoum AI Inference Accelerator _Page_12

Here is the SPU clusters datapath. This one is a fairly complex and dense slide that I am not going to have time to transcribe.

Moffett Antoum AI Inference Accelerator _Page_13

The NNCore also has a custom Vector Processing Unit or VPU to do vector processing and can handle INT8 and FP16 data types.

Moffett Antoum AI Inference Accelerator _Page_14

There are also domain-specific accelerators.

Moffett Antoum AI Inference Accelerator _Page_15

There is a core-to-core interconnect to move data around and do things like share caches.

Moffett Antoum AI Inference Accelerator _Page_16

The hybrid sparsity helps accelerate sparse LLM’s.

Moffett Antoum AI Inference Accelerator _Page_17

Here are the key specs, including a 70W TDP and 800MHz frequency.

Here are the SparseOne AI inference cards.

Moffett Antoum AI Inference Accelerator _Page_19

Moffett has SparseOne toolchain to run models on its cards.

Moffett Antoum AI Inference Accelerator _Page_21

Here is a bit more about the toolchain. It seems like Moffett is using half of its talk or so on software which is usually a good sign.

Moffett Antoum AI Inference Accelerator _Page_22

Sparsity can be traded for throughput, and also accuracy for throughput. The benchmarks seem to show that the Moffett S4 is faster than the NVIDIA Tesla T4.

Moffett Antoum AI Inference Accelerator _Page_24

This is Moffett’s multi-card solution.

Moffett Antoum AI Inference Accelerator _Page_25

Here is the performance of 8x Moffett S30 cards.

Moffett Antoum AI Inference Accelerator _Page_26

Moffett cannot use sparsity in the MLPerf closed division, so it has to be submitted in the open category.

Moffett Antoum AI Inference Accelerator _Page_27

This was the demo.

Moffett Antoum AI Inference Accelerator _Page_28

Here is the super-resolution increase from 9fps to 59fps in this demo at similar quality.

Moffett Antoum AI Inference Accelerator _Page_29

There is another demo.

Moffett Antoum AI Inference Accelerator _Page_30

And another one.

Moffett Antoum AI Inference Accelerator _Page_31

Here is the summary wall of text.

Moffett Antoum AI Inference Accelerator _Page_32

Final Words

Overall these are cool accelerators. The bigger question is who are the customers for its AI parts today, and who will be the customers in the future. Cerebras, for example, has announced ~$1B in deal value over the next 18 months.

Still, it will be interesting to see how this develops.

Moffett Antoum AI Inference Accelerator at Hot Chips 2023

Final Words

RELATED ARTICLESMORE FROM AUTHOR

The Top NVIDIA HGX B200 Server Supermicro SYS-422GA-NBRT-LCC at OCP 2024

Meta Announces AMD Instinct MI300X for AI Inference and NVIDIA GB200 Catalina

AMD Pensando Pollara 400 UltraEthernet RDMA NIC Launched

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR