Nvidia Unveils a Revamped Server GPU and New Mellanox Hardware

The latest version of Nvidia's flagship A100 server GPU packs twice as much high-speed memory as its predecessor.
Author:
Publish date:

Six months after revealing a new flagship server GPU, Nvidia  (NVDA) - Get Report is giving it a memory upgrade while also launching powerful hardware and networking solutions aimed at researchers, supercomputer builders and/or cloud giants.

Nvidia’s 80GB A100 GPU

Nvidia’s A100 server GPU can now be purchased with 80GB of high-speed HBM2e memory -- twice as much as before -- the company announced on Monday morning.

For comparison, Nvidia’s most powerful gaming GPUs, the GeForce RTX 3090 and 3080, pack 24GB and 10GB, respectively, of GDDR6X graphics memory.

In addition to being sold on a standalone basis, the 80GB A100 will be offered via Nvidia’s DGX A100 server and HGX A100 server baseboards, which can pack up to 8 A100 GPUs connected via Nvidia’s proprietary NVLink interconnect. Naturally, it will also support Nvidia’s DGX SuperPod solution for quickly building GPU-accelerated supercomputers.

Nvidia promises the 80GB A100 will deliver major performance gains relative to the 40GB version for certain workloads. Source: Nvidia.

Nvidia promises the 80GB A100 will deliver major performance gains relative to the 40GB version for certain workloads. Source: Nvidia.

Nvidia claims the A100’s memory boost enables major performance improvements for various AI model training, high-performance computing (HPC) and data analytics workloads, while also improving AI inference performance and allowing the GPU to be more power-efficient. As it is, the 40GB A100 -- the first GPU based on Nvidia’s Ampere architecture -- delivers giant performance gains for training, inference and to a lesser extent, traditional HPC workloads relative to its predecessor, the Tesla V100.

The DGX Station A100

Along with the 80GB A100, Nvidia unveiled the DGX Station A100, a powerful workgroup server for AI/HPC researchers that -- with the help of an advanced cooling system -- has been placed into a desktop form factor. The DGX Station A100 packs four 80GB or 40GB A100 GPUs and (echoing the standard DGX A100, which has two of them) a 64-core AMD  (AMD) - Get Report Epyc server CPU.

BMW, Lockheed Martin and Japanese mobile carrier NTT Docomo will be among the first buyers of the DGX Station A100. All three firms plan to use the server to aid their AI research efforts.

Nvidia's DGX Station A100 at a glance. Source: Nvidia.

Nvidia's DGX Station A100 at a glance. Source: Nvidia.

“Typically, when customers want to build a supercomputer, they plan it, and then it takes months or years to build. And the first thing their users ask is, ‘When can I get the very first node, so that I can do software development?’,” said an Nvidia representative. “Well now...you can get your very first node [immediately], because [the DGX Station A100 has] the exact same GPU board that’s going into the supercomputer that you’re building.”

400-Gig Mellanox Interconnects

Nvidia’s Mellanox server interconnect unit is unveiling a slew of new products -- collectively known as the NDR 400G InfiniBand line -- that support 400 Gb/s InfiniBand connections. To date, Mellanox’s most powerful InfiniBand offerings have topped out at 200 Gb/s.

Mellanox’s 400-gig InfiniBand lineup is expected to start sampling in the second quarter of 2021. It’s being unveiled at a time when -- as Nvidia CEO Jensen Huang has stressed -- network connections have often emerged as a bottleneck for server nodes handling demanding AI and HPC workloads with the help of GPUs or other accelerators.

The NDR 400G InfiniBand lineup includes both a 400-gig adapter card and a 64-port switch promised to have three times the switch port density of its predecessor. There’s also a 400-gig data processing unit (DPU) that pairs network adapter functions with Arm CPU cores that can offload various network, storage and security processing functions, as well as copper cables and optical transceivers that can handle 400-gig connections.

Mellanox's NDR 400G InfiniBand lineup. Source: Mellanox.

Mellanox's NDR 400G InfiniBand lineup. Source: Mellanox.

Microsoft  (MSFT) - Get Report, which has already teamed with Nvidia and Mellanox to provide on-demand access to supercomputer resources via Azure cloud computing instances, indicates it will support the NDR 400G InfiniBand line. Mellanox’s 400-gig offerings are being revealed two months after Nvidia announced its deal to buy Arm, and a month after Nvidia outlined an ambitious roadmap for launching 200-gig and (eventually) 400-gig DPUs that leverage both Arm CPU cores and an integrated GPU to offload functions from server CPUs.

Nvidia remains the dominant player in both the AI training and HPC/supercomputer accelerator markets, and is also a major player in the burgeoning market for AI inference server accelerators. Mellanox’s InfiniBand and Ethernet interconnects are likewise a common sight within both supercomputers and cloud data centers.

Nvidia’s announcements come amid the SC20 supercomputing conference, and ahead of an October quarter earnings report due on Wednesday afternoon. Nvidia execs will be delivering an SC20 presentation at 6 P.M. Eastern Time on Monday to go over their company’s latest offerings.

Nvidia, AMD and Microsoft are holding in Jim Cramer’s Action Alerts PLUS Charitable Trust Portfolio. Want to be alerted before Cramer buys or sells these stocks? Learn more now.