Moffett AI showed off its new AI inference SoC at Hot Chips 2023. We have previously seen Moffett in MLPerf Inference v3.0 and previous generations.
Please excuse typos, this is being written live at the conference.
Moffett Antoum AI Inference Accelerator at Hot Chips 2023
Moffett says that lately, we are seeing many more large language models while the image or computer vision module explosion happened years ago.
![Moffett Antoum AI Inference Accelerator _Page_03](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_03-800x450.jpg)
These models have different training data set sizes. Here is an example of the differences. Usually, computer vision models are very small while the input sizes are very large. Language models are very different.
![Moffett Antoum AI Inference Accelerator _Page_04](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_04-800x450.jpg)
The Moffett Antoum is designed for sparsity in models both vision and language models.
![Moffett Antoum AI Inference Accelerator _Page_05](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_05-800x450.jpg)
The quad core CPU can run Linux.
Sparsity exists in tensor algebra because zeros occur naturally in the operation.
![Moffett Antoum AI Inference Accelerator _Page_07](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_07-800x450.jpg)
Sparsity can happen in weight sparsity that is well-known. There are also opportunities to exploit sparsity in the input data, conditional sparsity, and more.
![Moffett Antoum AI Inference Accelerator _Page_08](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_08-800x450.jpg)
Moffett is using a compile aware simulator to see how to map models and sparsity to its accelerators.
![Moffett Antoum AI Inference Accelerator _Page_09](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_09-800x450.jpg)
This is the SoC architecture. Since it is using sparsity, it can have slower external interfaces like PCIe Gen3.
![Moffett Antoum AI Inference Accelerator _Page_11](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_11-800x450.jpg)
Going into some of these blocks will be the next.
Within the NNCore subsystem, of which there are four on the chip, there is a SPU. SPU stands for sparse processing unit.
![Moffett Antoum AI Inference Accelerator _Page_12](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_12-800x450.jpg)
Here is the SPU clusters datapath. This one is a fairly complex and dense slide that I am not going to have time to transcribe.
![Moffett Antoum AI Inference Accelerator _Page_13](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_13-800x450.jpg)
The NNCore also has a custom Vector Processing Unit or VPU to do vector processing and can handle INT8 and FP16 data types.
![Moffett Antoum AI Inference Accelerator _Page_14](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_14-800x450.jpg)
There are also domain-specific accelerators.
![Moffett Antoum AI Inference Accelerator _Page_15](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_15-800x450.jpg)
There is a core-to-core interconnect to move data around and do things like share caches.
![Moffett Antoum AI Inference Accelerator _Page_16](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_16-800x450.jpg)
The hybrid sparsity helps accelerate sparse LLM’s.
![Moffett Antoum AI Inference Accelerator _Page_17](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_17-800x450.jpg)
Here are the key specs, including a 70W TDP and 800MHz frequency.
![Moffett Antoum AI Inference Accelerator _Page_18](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_18-800x450.jpg)
Here are the SparseOne AI inference cards.
![Moffett Antoum AI Inference Accelerator _Page_19](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_19-800x450.jpg)
Moffett has SparseOne toolchain to run models on its cards.
![Moffett Antoum AI Inference Accelerator _Page_21](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_21-800x450.jpg)
Here is a bit more about the toolchain. It seems like Moffett is using half of its talk or so on software which is usually a good sign.
![Moffett Antoum AI Inference Accelerator _Page_22](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_22-800x450.jpg)
Sparsity can be traded for throughput, and also accuracy for throughput. The benchmarks seem to show that the Moffett S4 is faster than theĀ NVIDIA Tesla T4.
![Moffett Antoum AI Inference Accelerator _Page_24](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_24-800x450.jpg)
This is Moffett’s multi-card solution.
![Moffett Antoum AI Inference Accelerator _Page_25](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_25-800x450.jpg)
Here is the performance of 8x Moffett S30 cards.
![Moffett Antoum AI Inference Accelerator _Page_26](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_26-800x450.jpg)
Moffett cannot use sparsity in the MLPerf closed division, so it has to be submitted in the open category.
![Moffett Antoum AI Inference Accelerator _Page_27](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_27-800x450.jpg)
This was the demo.
![Moffett Antoum AI Inference Accelerator _Page_28](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_28-800x450.jpg)
Here is the super-resolution increase from 9fps to 59fps in this demo at similar quality.
![Moffett Antoum AI Inference Accelerator _Page_29](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_29-800x450.jpg)
There is another demo.
![Moffett Antoum AI Inference Accelerator _Page_30](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_30-800x450.jpg)
And another one.
![Moffett Antoum AI Inference Accelerator _Page_31](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_31-800x450.jpg)
Here is the summary wall of text.
![Moffett Antoum AI Inference Accelerator _Page_32](https://www.servethehome.com/wp-content/uploads/2023/08/Moffett-Antoum-AI-Inference-Accelerator-_Page_32-800x450.jpg)
Final Words
Overall these are cool accelerators. The bigger question is who are the customers for its AI parts today, and who will be the customers in the future. Cerebras, for example, has announced ~$1B in deal value over the next 18 months.
Still, it will be interesting to see how this develops.