Impressions from NVIDIA GTC 2019

Technical

Arthur Chan March 26, 2019

Inside every serious deep learning house, there is a cluster of machines. Inside each machine, there is one GPU card. Since Voci is a serious deep learning house, we end up owning many GPU cards.

By now, no one would disagree that deep learning has reinvigorated the ASR industry. Back in 2013, Voci was one of the earliest startups to adopt deep learning. It was at the same time when Hinton's seminal paper [1] was still fresh. Some brave souls in Voci, including Haozheng Li, Edward Lin and John Kominek, decided to just jump in to this then-radically new approach. My hybrid role, as part researcher, and part software maintainer, also started then. We did several other things in Voci, but none of them is as powerful as deep learning.

But I digress. Where were we?

Voci has a lot of GPU cards. At first you might have the impression that GPU is more like a "parallellizable-CPU". But the reality is that, because a GPU is specifically made for high-performance computing applications such as graphics rendering, it has a very different design from a CPU. If you are a C-programmer, you might be thinking of Compute Unified Device Architecture (or CUDA, as NVIDIA loves acronyms). But your intuition, which was developed from years of programming CPUs (Intel or Intel-like), would be completely wrong.

We realized all this at Voci. That's why part of our focus is to understand how GPUs work, and that's why I and my boss, John Kominek, decided to travel to Silicon Valley and attend the GTC (GPU Technology Conference) 2019.

This article covers the Keynote by Jensen Huang, the Poster session as well as various booths.

Huang's Keynotes

Jensen Huang

We love Jensen Huang! He walked all around the stage, enthusiastically explaining to the ten thousand-strong audience what's new with NVIDIA.

Here is my round-up of the top 5 announcements:

CUDA-X: This is like a convergence of different technologies within NVIDIA. CUDA, as we now know, is more like a programming language, whereas CUDA-X is more an architecture term within NVIDIA, which encompasses various other technologies, such as RTX, HPC, AI, etc.
Mellanox Acquisition: When you look at it, the strength of NVIDIA against its competitors is not just GPU cards. NVIDIA also has an infrastructure that enables customers to build systems among GPU cards. So of course, the question arises: how do you use multiple machines, each with separate cards? That explains the Mellanox deal (Infiniband?). That also explains why Huang spent the lion's share of his time talking about data centers, how different containers are talking with each other and how that generates traffic. In a way, it is not just about the card, it is about the card and all peripherals. It is also about the machines and their ecosystem.
The T4 GPU: The way NVIDIA markets it, T4 is suitable for data centers which focus on AI. Current benchmarking says a T4 has lower speed than a V100, but has a higher energy efficiency. This year's big news on the server side is that AWS has now adopted T4 in their GPU instances.
Automatic Mixed Precision (AMP): News for us techies! The most interesting part is that AMP is now available in Tensorcores. Once you create a production system on either training or inference, the first thing you will realize is that it takes a lot of GPU memory. Reducing precision is one way to reduce it. But when you reduce precision, it's possible that the quality of your tasks (training or inference) would degrade. So, it's a tricky problem. A couple of years ago, researchers figured out a couple of methods. You can implement it yourself, but NVIDIA has decided to put it in Tensorcore directly.

Oh, FYI, keynotes feel like a party.

GTC Keynote

Booths

In a large conference like GTC, you can learn many interesting aspects of technology. Unlike a purely academic conference, GTC is also a tradeshow. Here are some impressions:

All GPU peripherals: Once you get a GPU card, perhaps the biggest problem is how to install it and make it usable. It should be plug and play right? Nope. In reality, working with hardware GPU cards is a very difficult technical problem. Part of the issue is heat dissipation. If you don't believe me, go and put a few consumer-grade GPU cards into the same box. You can use it as a heater in the Boston winter!

That may be why there are so many vendors other than NVIDIA trying to get into the game of building GPU-based servers. This probably accounted for one-third of the booths in the show.
Self Driving Car/LiDAR: I don't envy my colleagues in the SDC industry. When will we actually see Level 4 self-driving? People definitely want to see SDC in the near future. So that's why you see so many SDC vendors at conferences like this.
The Ecosystem: Finally, there are also demonstrations of various cloud systems which use GPUs.

There are more than 100 vendors showcasing their AI products. If you go to look at all the booths, you are going to get very hungry. So, donuts!

Donuts

[1] The paper was actually jointly written by researchers from Google, IBM and Microsoft. Notice that these researchers were from separate (and rival) groups and seldom wrote joint papers, never mind achieving ground-breaking results together.

« Back

Arthur Chan

Arthur Chan is a Principal Speech Architect at Voci. He hacks speech recognizers and machine learning algorithms for a living. He also manages the Facebook Group, Artificial Intelligence and Deep Learning.

Impressions from NVIDIA GTC 2019

Huang's Keynotes

Booths

Arthur Chan

Access our ASR API