In this work we present how, without a single line of code change in the framework, we can further boost the performance for deep learning training by up to 2X and inference by up to 2.7X on top of the current software optimizations available from open source TensorFlow* and Caffe* on Intel® Xeon® and Intel® Xeon Phi™ processors. Our system-level optimizations result in a higher throughput and a reduction in time-to-train for a given batch size per worker compared to the current baseline for image recognition Convolution Neural Networks (CNN) workloads.
Intel® Xeon® and Intel® Xeon Phi™ processors are extensively used in deep learning and high performance computing applications. Popular deep learning frameworks such as TensorFlow*, Caffe*, and MxNet* have been optimized by Intel software teams to deliver optimal performance on Intel platforms for both deep learning training and inference workflows. With Intel and Google’s continuing collaboration, the performance of TensorFlow has significantly improved with Intel® Math Kernel Library (Intel® MKL) and Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). Similarly, the Intel® Distribution of Caffe* also delivers significant performance gains on Intel Xeon and Intel Xeon Phi processors.
Training deep Convolution Neural Networks (CNNs) such as ResNet-50, GoogLeNet-v1, Inception-3, and others involves …