Michael Wong

Codeplay技术VP

C++ 嵌入式开发委员会SG14与机器学习委员会SG19主席,同时担任C++语言方向演化委员会主席,Codeplay研发副总裁,C++标准委员会加拿大代表团团长。Michael在C++并行计算、高性能计算、机器学习领域拥有丰富工作经验,他领导制订了应用于GPU应用开发C ++异构编程语言(SYCL)标准.对Tensorflow底层性能优化有着深刻的研究和见解。其具体工作涵盖并行编程、神经网络、计算机视觉、自动驾驶等领域。Michael 曾任 IBM 高级技术专家,领导 IBM XL C++编译器、XL C 编译器的开发工作。

演讲主题

使用异构C++与SYCL加速TensorFlow机器学习

Programming Models for Self Driving cars with SYCL and Heterogeneous C++ When writing software to deploy deep neural network inferencing, developers are faced with an overwhelming range of options, from a custom-coded implementation of a single model to using a deep learning framework like TensorFlow or Caffe. If you custom code your own implementation, how do you balance the competing needs of performance, portability and capability? If you use an off-the-shelf framework, how do you get good performance? Codeplay has been building and standardizing developer tools for GPUs and AI accelerators for over 15 years. This talk will explore the approaches available for implementing deep neural networks in software, from the low-level details of how to map software to the highly parallel processors needed for AI all the way up to major AI frameworks. This will start with the LLVM compiler chain used to compile for most GPUs, through the OpenCL, HSA and SYCL programming standards (including how they compare with CUDA), all the way up to TensorFlow and Caffe and how they affect the key metrics like performance. SYCL is a Heteroegeneous C++ language that provides the building blocks for building such C++ libraries, where the gap between the hardware agnostic C++ features and the C++ abstractions of the hardware features can be bridged. SYCL has also been released as a free to download Community Edition called ComputeCPP to help you build higher abstractions for neural network, and machine vision, all leading to the ability to program self-driving cars. As Chair of C++ Standard’s SG14 where the gamers, financial traders, and embedded device programmers have been demanding a heterogeneous programming model, I have been studying programming models that can show us learning experience that enables a future ISO C++ to support heterogeneous devices. The number is actually numerous. My search has brought me through SYCL, HPX, Agency, HCC, OpenMP, OpenACC, OpenCL, C++ AMP, Halide, CUDA, Kokkos, Raja and many others. Yet as performance and power-efficiency become the holy grail of modern C++ applications, the hardware solutions that deliver them differ greatly in architecture decisions and designs. The combination of CPUs, GPUs, FPGAs and custom domain specific hardware is gaining a lot of momentum. In view of this, C++ programming techniques and features are changing as well. Modern C++ standards are enabling more and more parallelism and heterogeneity in the library and language features. This talk will compare many of the most popular model in terms of their memory model, data movement, and execution abstraction.

机器学习中电脑与人脑的对比

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号