Big News From GTC 2017 China NVIDIA CUDA 9 Launch

September 27, 2017

10x NVIDIA GTX 1080 TI FE Plus Mellanox Top

At GTC 2017 in China (NVIDIA has multiple GTC’s), NVIDIA announced that CUDA 9 is now available. That is a major milestone in the HPC/ AI industries as with each new CUDA release we generally see support for new architectures and libraries optimized for the most cutting-edge applications. NVIDIA CUDA 9 has been available in release candidate form for some time but we are finally seeing the GA mark of the new tooling.

New NVIDIA CUDA 9 Features

If you want to get a full overview, the NVIDIA Parallel Forall blog has an in-depth look at the new features of NVIDIA CUDA 9. We suggest giving it a read:

https://devblogs.nvidia.com/parallelforall/cuda-9-features-revealed/

The key features via the NVIDIA Developer site are listed as:

Speed up high-performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH
Express rich parallel algorithms with threads from sub-tiles to warps, blocks, and grids
Manage and reuse threads efficiently within an application with new API and function primitives
Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type

There are also a number of Volta and NVLink support items that have been added in the newest CUDA 9 release:

Replace warp-synchronous programming with robust programming model on Kepler architecture and above
Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
Scale multi-GPU applications with next-generation NVLink delivering 2X throughput of prior generation
Increase GPU utilization with Volta Multi-Process Service (MPS)
Profile PCIe usage by analyzing bandwidth of memory transfers, latency, and comparison with NVLink

STH will be updating many of our nvidia-docker images with the new CUDA 9 after testing.

New NVIDIA CUDA 9 Features

RELATED ARTICLESMORE FROM AUTHOR

Minisforum’s Upcoming MS-03 SFF PC Brings More of a Good Thing

Building a Dense Agentic AI CPU Rack Today

HPE Discover 2026 Keynote Coverage

RELATED ARTICLES MORE FROM AUTHOR