I recently had a chance to visit Intel's San Diego offices, where -- following the company's 2016 acquisition of San Diego-based AI chip startup Nervana Systems -- the company does much of its AI-related silicon and software R&D work. At the meeting were Arjun Bansal, Intel's VP of AI Software and Research, and Casimir Wierzynski, a Senior Director for Intel's AI Research group.
Where the Server AI Chip Market Market Stands Today
The market for chips used to handle AI/deep learning workloads within data centers can be split into two groups:
Chips used by powerful computing systems that train AI models to do things such as understand voice commands, detect objects within photos or help a car drive itself around city streets.
Chips that run trained AI models against new data and content -- for example, a request by a mobile app to help process a voice command or deliver personalized news feed content. This activity, known as inference, is much less computationally demanding than training and can be handled both by servers and by end-user hardware such as phones, PCs and cars.
Though competition is starting to pick up a bit, a very large percentage of AI training work is still handled by Nvidia's Tesla server GPU family. A lot of server inference work, by comparison, has historically been handled by Intel's Xeon server CPUs. However, a growing portion of inference work is now being handled by accelerators such as Nvidia GPUs, programmable chips (FPGAs) from Intel and Xilinx (XLNX) - Get Report and custom-designed chips (ASICs) such as Alphabet/Google's (GOOGL) - Get Report Tensor Processing Units (TPUs - they can also be used for training) and Amazon.com's (AMZN) - Get Report new AWS Inferentia chip.
Intel's Server AI Chip Strategy
Whereas Nvidia's server AI chip efforts revolve completely around GPUs featuring specialized processing cores for AI workloads (they're known as Tensor Cores), Intel plans to support a wide array of chip architectures. Its current and planned server AI offerings include:
The NNP-L1000 and NNP-I, a pair of Nervana ASICs that are respectively meant for training and inference. Both are promised to enter production later this year. Facebook has been a development partner for Intel's AI ASICs.
FPGAs that can be used for inference. Microsoft and Baidu use Intel's FPGAs for AI work.
A server GPU lineup. The company's first server GPU(s) are expected in 2020.
DL Boost, a set of technologies meant to improve the inference performance of Xeon CPUs. The first version of DL Boost was introduced with Intel's recently-unveiled Cascade Lake Xeon CPUs.
When asked about the competitive strengths of the NNP-L1000 relative to offerings such as Nvidia's Tesla GPUs, Bansal was eager to point out that the chip has been designed from the ground up to train AI/deep learning models and thus doesn't have to concern itself with graphics-related functions. "We don't have to spend any die area on graphics-related compute," he said.
He also pointed out that thanks to the NNP-L1000's unique processing architecture (it relies on a number encoding format known as bfloat16), the chip can use 16-bit multiplier circuits to deliver performance comparable to what GPUs require 32-bit multipliers for. This results in smaller and more power-efficient multipliers, and (since the multipliers require half as much data) doubles the chip's effective memory bandwidth.
Along similar lines, Bansal argued that in the inference space, the NNP-I will be very competitive relative to FPGAs "from a power-performance perspective," and deliver strong performance for workloads such as machine translation, speech recognition and recommendation systems. At the same time, he noted that some customers will still prefer FPGAs due to their ability to reconfigured to handle new tasks.
When asked about how Intel sees server CPUs being used for inference as demand for accelerators takes off, he suggested that companies will still use idle CPU capacity for inference work. "People have a lot of dormant [server] capacity at times," he noted.
The Importance of Software
In addition to its large chip R&D investments, Nvidia's dominant position in the AI training silicon market has much to do with the developer ecosystem it has built out. This ecosystem is underpinned by the company's CUDA programming model and related CUDA Deep Neural Network (cuDNN) software library, which supports the most popular deep learning software frameworks (and some less-popular ones).
Though it has created deep learning software libraries that are optimized for its CPUs, Intel's strategy for chipping away at Nvidia's massive developer mindshare doesn't revolve around creating a direct rival to CUDA and cuDNN, but on driving adoption of a solution known as nGraph. nGraph is a compiler -- a program that translates code from a programming language into machine code that can be executed by a processor -- meant to work with a variety of deep learning frameworks across a variety of processor types (Xeon CPUs, Nervana ASICs and even Nvidia GPUs) for both training and inference work.
Intel argues that since many AI software frameworks have been optimized for a particular kind of processor (in many cases, Nvidia's GPUs), it's often too difficult today to port an AI model relying on one type of processor to another type of processor, and that it can also be too hard to get a model to run on a different framework. nGraph, the company insists, does away with such challenges.
While hand-optimized AI software libraries can be effective when a company is relying on just one processing architecture, they aren't when a company is using three or four of them. "And there are advantages for having three or four instead of having one," Bansal asserted.
The challenge for Intel, of course, is convincing enterprises and cloud giants that they should be using more than one architecture at a time when many are exclusively relying on Nvidia's GPUs for AI training. If a company chooses to keep solely relying on Nvidia's GPUs for training, it will likely stick with Nvidia's widely-supported software tools. On the other hand, if the AI training silicon market starts to fragment, Intel's sales pitch for nGraph becomes a lot stronger.
Separately, Wierzynski pointed out that Intel is also investing in software solutions to address AI privacy concerns, such as an open-source solution for the processing of encrypted AI data. One use case he gave for the solution: A hospital could send encrypted data to a radiologist working remotely with no patient info being shared, and the radiologist could send back an encrypted version of his or her answer to the hospital.
The Big Picture
It's unlikely that Nvidia will relinquish its current lead in the AI training processor market anytime soon, particularly given that it's also investing heavily in the space. And while the server inference processor market is more competitive, Intel could end up being one of several formidable players there, along with the likes of Nvidia and Xilinx.
However, Intel does have a unique silicon and software strategy for growing its AI accelerator sales, and is clearly putting its money where its mouth is. And though much remains to be shared about their exact performance and power consumption, the fact that the company's Nervana ASICs are being built from the ground up to handle AI work could help them achieve some success.
A Special Invitation:Do you want to learn more about planning for and living retirement from the nation's top experts, including Ed Slott and Robert Powell, the editor of TheStreet's Retirement Daily? Want to learn how to create tax-efficient income in retirement and how to manage and mitigate all the risks you'll face in retirement? Then sign up to attend TheStreet's Retirement Strategies Symposium on April 6 in New York City. For a limited time, you can attend this extraordinary symposium for $149 - a cost savings of $50 off the general admission price of $199.
You can see the full day's agenda, learn about the guest speakers and sign up HERE for this special event.