At NVIDIA GTC 2026, KIOXIA made one of the more interesting storage announcements of the show. The company revealed two new SSD products targeting the rapidly growing AI inference market, and the positioning here is meaningfully different from standard data center NVMe fare. These are drives designed explicitly for GPU-initiated access, a use case that will only become more important as AI model complexity outpaces local HBM capacities. At GTC 2026, NVIDIA outlined the key reason. There will be many more agents making many more requests on storage than in years past, and the KV Cache is becoming an important but also a different class of storage.
NVIDIA Storage-Next and Why It Matters
To understand what KIOXIA is doing here, it helps to understand NVIDIA’s Storage-Next initiative. The program calls on SSD vendors to engineer drives that GPUs can access directly, effectively extending the GPU’s usable memory hierarchy beyond the limits of on-package High Bandwidth Memory. As AI workloads shift from being purely compute-intensive to increasingly data-intensive, think trillion-parameter models and multi-million token context windows, the bottleneck shifts toward memory capacity rather than raw FLOPS. DRAM simply cannot keep up with those demands at any reasonable cost.

Storage-Next is NVIDIA’s architectural answer. It pulls high-performance flash into the GPU’s memory space so that data-hungry workloads like KV caches for large-scale inference have somewhere to live. That, in turn, will require a step function in higher IOPS (100 million!) and also better handling of smaller transfer sizes to keep GPUs fed. This is an architecture designed to keep PCIe buses utilized so that the GPUs are not idle waiting for data. For this, Kioxia has the GP series.
KIOXIA GP Series: Super High IOPS SSD
The GP Series is the headline product. It uses KIOXIA’s XL-FLASH Storage Class Memory rather than conventional TLC NAND, which has a few implications. XL-FLASH is a SLC-based storage class memory that KIOXIA has had in its portfolio for some time. It trades raw density for latency and IOPS, which is exactly the tradeoff needed when the GPU is doing fine-grained, low-latency reads rather than the large sequential transfers that data center SSDs typically optimize for. KIOXIA is specifically highlighting 512B access granularity with the GP series, far finer than the typical 4K minimum of conventional NVMe SSDs. That matters a great deal when serving attention head activations or KV cache lookups from GPU-directed requests rather than host CPU I/O since 4K access might leave the PCIe bus underutilized.
KIOXIA CM9 Series: PCIe 5.0 E3.S for KV Cache
The CM9 is the more near-term product and takes a different angle. Where the GP Series is a novel architecture play, the CM9 is a high-capacity, high-endurance PCIe 5.0 E3.S SSD aimed at KV cache workloads in large-scale AI inference clusters. We have previously covered the announcement of theĀ Kioxia CM9 PCIe Gen5 NVMe SSDs. Kioxia’s 3 DWPD rating on a 25.6 TB drive is worth doing the math on: 76.8 TB of writes per day.

For inference infrastructure that is writing and invalidating KV cache entries at high throughput, this is the kind of endurance specification that starts to make sense. To give you some sense of why this matters, imagine an agent or sub-agent is spun up for a task, and needs its KV cache data, but it only lives for a minute before it is destroyed. KIOXIA is positioning the CM9 alongside NVIDIA’s Context Memory Storage (CMX) architecture, which defines how inference systems should tier memory between HBM, DRAM, and high-performance storage.
Final Words
The GP Series is really the more provocative of the two announcements. XL-FLASH has been around for some time, but it has normally been a hot data tier to make CPU access of storage faster. Now, with GPUs and agents, there will be a new class of storage that can afford to act differently, as it is designed to be storage for GPU applications. Positioning it as a GPU-accessible memory extension under the NVIDIA Storage-Next banner gives it a much more specific and defensible use case. Whether that translates into design wins depends largely on how aggressively NVIDIA pushes Storage-Next adoption in its next generation of GPU platforms and systems, but the timing alongside GTC is not accidental and NVIDIA is making a major push on storage with its STX racks.
2027 might be when storage gets really exciting again.



