On Thursday morning, Nvidia unveiled the A100, the first GPU to rely on its next-gen Ampere architecture. The A100, said by Nvidia to already be in full production, packs a massive 54 billion transistors. That figure easily eclipses the 21 billion transistors used by Nvidia’s prior flagship server GPU, the Tesla V100, which relies on the older Volta architecture.
The A100 is being shown off during a recorded keynote by CEO Jensen Huang that starts airing at 9 A.M. Eastern time. The keynote was originally meant to take place during Nvidia’s GTC conference, which was scheduled for March but cancelled due to the COVID-19 pandemic.
A Versatile Flagship GPU
Whereas the V100 is made using a 12-nanometer (12nm) Taiwan Semiconductor (TSM) - Get Report manufacturing process, the A100 is made using a more advanced 7nm TSMC process. It also comes with 40GB of HBM2 graphics memory, up from the V100’s 32GB, and has access to 1.6TB/s of memory bandwidth, up from the V100’s 900GB/s.
The A100 was designed to handle both the computationally-demanding task of training AI/deep learning models to do things such as understand voice commands and detect objects within photos, and the running of trained models against real-world data and content (inference). It’s also meant to handle traditional high-performance computing (HPC) workloads such as modeling and simulation, as well as traditional enterprise data science and machine learning workloads.
In addition, with the latest version of the popular Apache Spark analytics engine (Spark 3.0) supporting GPU acceleration, Nvidia asserts that the A100 can handle big data processing via Spark (including for AI-related workloads) much more efficiently than server CPUs, which have historically handled the task. The company claims Adobe (ADBE) - Get Report achieved a 7x performance improvement by using GPUs to accelerate Spark 3.0 on the Databricks analytics software platform.
The breadth of the workloads that the A100 is meant to handle makes it a successor not only to the V100, which launched in mid-2017 and is often used for AI training and HPC workloads, but also the cheaper Tesla T4 GPU, which launched in late 2018, relies on Nvidia’s Turing architecture and is often used to perform inference. This flexibility, Huang argued during a press briefing, will allow cloud data centers to use server resources more efficiently, and also speed up the data pipeline between training and inference workloads.
“The data center is the future computing unit,” Huang declared. He also predicted that the A100’s horsepower will help drive the creation of “some really gigantic AI models.”
While its exact performance gains relative to the V100 and T4 will depend on the workload involved, Nvidia did claim that the A100 outperforms the V100 by 6x and 7x, respectively, when handling training and inference workloads involving the popular BERT natural language-processing model.
And with the help of next-gen Tensor Cores (processing cores dedicated to deep learning operations), Nvidia claims up to a 20x performance increase when training AI models using relatively demanding 32-bit arithmetic or performing inference while relying on the frequently-used INT8 data type. A relatively modest 2.5x performance gain is claimed for HPC workloads requiring 64-bit arithmetic.
Also: In the event that a particular workload doesn’t need the A100’s full resources, the GPU can be partitioned into seven separate instances that can handle different tasks.
Other Product Announcements
Along with the A100, Nvidia is launching the DGX A100, an enterprise server that packs eight A100 GPUs and retails for $199,000. The DGX A100, which succeeds the V100-powered DGX-2 server, connects its GPUs with the help of a next-gen version of Nvidia’s NVLink interconnect technology that doubles its bandwidth. It also packs 15TB of flash storage, six 200Gb/s network cards from recently-acquired Mellanox Technologies and a pair of 64-core AMD (AMD) - Get Report Epyc CPUs.
Nvidia is also unveiling several other offerings today. These include:
- The EGX A100, a platform for building A100-powered servers meant to handle edge computing workloads.
- The EGX Jetson Xavier NX, a low-power computing board that relies on Nvidia’s Xavier SoC and is meant for microservers and embedded systems.
- Merlin, a framework that’s promised to dramatically reduce the time needed to build AI-powered recommendation systems.
- Omniverse, a graphics/simulation software platform that allows visual effects pros around the globe to collaborate in real-time.
In addition, Nvidia announced that BMW plans to deploy robots based on the company’s Isaac robotics platform within its factories, and that Chinese electric car maker Xpeng Motors is launching a car that relies on Nvidia’s Drive AGX computing platform.
The Big Picture
With Nvidia currently possessing both a dominant position in the training accelerator market and a large position in the fast-growing inference accelerator market, the A100 should see wide adoption among cloud giants making large AI-related investments, both for internal workloads and cloud computing instances offered to third parties. Nvidia noted that (among others) AWS, Microsoft Azure and the Google Cloud Platform (GCP) will be supporting A100-powered instances, and Huang went as far as to say that he expects the A100 will “be in every cloud.”
Potentially giving an additional boost to the A100's near-term sales: The GPU is rolling out at a time when cloud server spending has been growing strongly, as Internet/cloud giants try to cope with traffic spikes caused by the COVID-19 pandemic.
The A100 is also launching during a time of intensifying competition in the AI accelerator space, as training and/or inference accelerators are rolled out by everyone from Intel (INTC) - Get Report and AMD, to cloud giants such as Google (GOOG) - Get Report and Amazon.com (AMZN) - Get Report, to Chinese tech firms such as Huawei and Alibaba (BABA) - Get Report, to startups such as private Cerebras Systems. Nvidia is wagering that the A100's performance and flexibility, together with the company's large software investments and ecosystem, will help it maintain its training dominance and continue gaining inference share.
Though no announcement has been made yet, the Ampere architecture is also widely expected to power Nvidia’s next-gen RTX gaming GPU line. Some reports have stated that a pair of high-end, Ampere-based, gaming GPUs known as the GeForce RTX 3080 and 3080 Ti will be launching in the coming months.