At a San Francisco event held on Tuesday, Qualcomm unveiled the Cloud AI 100, a server accelerator card solution meant to handle AI inference -- the running of trained AI models against new data and content, such as a voice assistant command, text from a web page or photos just uploaded from a smartphone. The AI 100, which is powered by a chip built from the ground up to handle inference, will begin sampling in the second half of 2019 and enter volume production in 2020.
Qualcomm is also unveiling three new Snapdragon system-on-chips (SoCs) -- the Snapdragon 730, 730G and 665 -- that are somewhat less powerful (and presumably cheaper) than its flagship Snapdragon 855 SoC, which will go inside of many of the flagship Android phones launching this year. Relative to predecessor chips such as the Snapdragon 720 and 650, Qualcomm promises meaningful improvements in areas such as graphics rendering, image-processing and AI inference; and in the case of the Snapdragon 730G, the company asserts the chip was optimized to provide superb gaming experiences.
Qualcomm's AI Competition and Performance Claims
While a lot of server inference work is still handled by Intel's (INTC - Get Report) Xeon CPUs, the market for accelerators that can perform this work more efficiently has begun taking off. Nvidia (NVDA - Get Report) , long the dominant player in the market for accelerators used to handle the demanding task of training AI models, has recently seen inference-related sales for its Tesla server GPU family surge; the company argues there's a benefit to using Nvidia GPUs to perform inference on AI models trained using other Nvidia GPUs.
Intel and Xilinx (XLNX - Get Report) , for their parts, have each seen growing sales of programmable chips (FPGAs) for inference work. In addition, cloud giants such as Alphabet/Google (GOOGL - Get Report) and Amazon.com (AMZN - Get Report) have developed custom chips (ASICs) that can perform server inference, and Intel and a host of startups are also developing inference ASICs.
Now Qualcomm is unveiling an inference ASIC of its own -- one that (although rivals may beg to differ here) the company claims is more than 10 times as powerful as "the industry's most advanced AI inference solutions available today," while also delivering superior power efficiency and supporting (depending on what a particular client wants) a number of different sub-100-watt power envelopes.
The server inference accelerator market is growing rapidly. Source: Qualcomm.
When asked about the AI 100's performance and power efficiency during a talk with TheStreet, Qualcomm SVP Keith Kressin said his company is being "purposefully vague" for now. However, he did indicate the chip would deliver a peak inference performance of more than 350 trillion operations per second (TOPS), or more than 50 times that of the Snapdragon 855. For comparison, Nvidia asserts its recently-launched Tesla T4 GPU, which features a 75-watt power envelope, can deliver 130 TOPS of inference performance while working with 8-bit (INT8) integers and 260 TOPS when working with 4-bit (INT4) integers.
Kressin argues Qualcomm's history of developing a variety of low-power signal-processing solutions via its Snapdragon family helped it craft a product with superior performance per watt. He also highlighted Qualcomm's use of a cutting-edge, 7-nanometer (7nm), Taiwan Semiconductor (TSM - Get Report) manufacturing process, which has also been embraced by a slew of other big-name chip developers.
Engagements with Cloud Giants
Kressin indicated Qualcomm will initially focus on selling the Cloud AI 100 to "tier-1" cloud providers, which for the time being account for a giant portion of inference accelerator purchases. The press release announcing the Cloud AI 100 includes a positive quote from an exec working for Microsoft's (MSFT - Get Report) Azure cloud services unit, and both Microsoft and Facebook (FB - Get Report) will be making appearances at Qualcomm's Tuesday event.
Such partnerships shouldn't be seen as exclusive. Microsoft has been making heavy use of Intel and Xilinx FPGAs within its data centers for inference work, and is also using Nvidia GPUs. Facebook, which has historically depended on CPUs for inference, has disclosed it's a development partner for Intel's NNP-I inference ASIC, which is due to enter production later this year.
Nonetheless, obtaining support from two tier-1 cloud giants that each spend massive sums annually on data center hardware isn't a bad start for Qualcomm. And it's possible that more tie-ups will be announced in time.
"We're talking to all the major players, all the tier-1 cloud guys, both in the U.S. and China," said Kressin. He also noted that Qualcomm, like some others in this space, has worked to support popular AI/machine learning software frameworks and runtimes, as well as provide optimizations for compiler software that are specific to its hardware. Along the way, the company has been able to leverage some of its investments in software solutions for helping developers improve the performance of AI workloads on Snapdragon SoCs.
Qualcomm promises extensive software support for the Cloud AI 100. Source: Qualcomm.
The Big Picture
The inference accelerator market is quite competitive, and Qualcomm is arriving relatively late to the show. But the company has made some impressive performance claims for the Cloud AI 100, and has clearly put effort into creating a product built for the needs of cloud giants. That could help it win some clients in a market that's unlikely to be a winner-takes-all one.
And just as it has made sense for Qualcomm to invest in growing its automotive, IoT, RF and Wi-Fi chip businesses at a time when smartphone sales are sluggish and a lot of uncertainty surrounds the company's massive patent-licensing business, there's certainly some logic to going after a fast-growing inference accelerator market with the help of silicon and software expertise built up while developing other products.