At this year’s Hot Chips, Microsoft introduced Project Brainwave, their deep learning acceleration platform. This new high-speed and ultra low-latency system is designed for real-time artificial intelligence (AI). The codenamed platform allows developers to deploy numerous deep learning models onto field-programmable gate arrays (FPGAs). The result is a performance level vastly exceeding that of a CPU or GPU.
“Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models.” — Doug Burger
Because it’s designed for real-time AI, Brainwave can process requests as fast as it gets them. “If it’s a video stream, if it’s a conversation, if it’s looking for intruders, anomaly detection, all the things where you care about interaction and quick results, you want those in real time,” explains Doug Burger, Distinguised Engineer at Microsoft. “Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users.”
“Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users,” Burger elaborates. Besides giving users an answer instantaneously, low-latency is integral for scaling deployment of machine learning systems.
Layers Like an Onion
To achieve its goals, the Microsoft team engineered the Brainwave platform as three main layers.
First, Brainwave had to have a “high-performance, distributed system architecture.” Microsoft is utilizing an infrastructure of FPGAs installed across the company’s many data centers. FPGAs basically let a programmer optimize its hardware to better serve a specific function. In this case, that function is lightning-fast support of AI to bolster Microsoft’s hardware microservices. Basically, Brainwave will load a tailored machine learning model into the FPGA’s memory. It can then compute the desired insights of the model for as long as necessary. If a FPGA cannot handle the model due to size, the infrastructure will utilize as many FPGAs as needed to run it.
Second, Brainwave required a powerful but versatile data processing unit (DPU) system. These DPUs are “soft” and synthesized onto the FPGAs. It’s not uncommon to find companies and startups taking the opposite approach by constructing hardened DPUs. While the performance of these “hard” DPUs is cutting-edge, the DPUs themselves are not flexible — they must choose data types and operators at design time and cannot adapt. Brainwave’s DPU design utilizes both synthesizable logic and the digital signal processing capabilities of the FPGAs. This allows it scalability in terms of data types. The data type can also be specified at synthesis time as opposed to design. So Brainwave’s DPUs can deliver the same performance or greater than that of hard DPUs while also being extremely adaptable to changes and optimizations.
Last but not least, in keeping the theme of versatility, Brainwave contains a software stack that will support all of the most popular deep learning frameworks around today. Currently, Brainwave supports Google’s Tensorflow and Microsoft Cognitive Toolkit, with many more in the works as well.
Talk the Talk, Walk the Walk
FPGAs have their share of criticisms. They’re viewed as inefficient compared to hardware specifically made for machine learning. Previous research in accelerating machine learning focuses on throughput optimization at the cost of latency. Burger wants people to focus more on how machine learning accelerators can operate without having to bundle requests into batches. Fortunately, Microsoft has demonstrated that Brainwave excels in both speed and versatility.
At Hot Chips, the Microsoft team debuted a version of Brainwave ported to Intel’s brand new 14 nm Stratix 10 FPGA. Even when running a model five times larger than what the industry-standard benchmarking neural networks like Resnet-50 and Alexnet use, Brainwave achieved a speed of 39.5 teraflops with no batching operations. This means that Brainwave can handle and provide machine learning insights to requests in real-time. Each request took less than a millisecond. Keep in mind that Intel’s Stratix 10 is very new; with optimizations and improvements, some surmise that Brainwave and Stratix 10 could achieve a speed of 90 teraflops.
Microsoft is far from the only company utilizing hardware to accelerate machine learning. Google recently announced the second iteration of its Tensor Processing Unit is in the works. Currently, Microsoft is focusing its efforts on bringing Brainwave to Azure, its cloud computing platform service. Azure customers will be able to run their most difficult deep learning models at unprecedented performance levels on Brainwave. They are also working towards a day when third parties can run any trained machine learning model on Brainwave. Other than that, there are no discrete release dates for Brainwave. But let’s hope this changes soon.
If this has you excited to jump into machine learning and artificial intelligence, check out our review of Intel’s Movidius Neural Compute Stick.