GTC Taiwan Jensen Keynote

Category: Taiwan Keynote · Year: 2018 · ▶ Watch

Segments (22)

00:00 · Introduction
- Jensen Huang welcomes the audience to GTC Taiwan 2018.
01:06 · The Rise of GPU Computing
- Discussion on the end of Moore’s Law and the necessity of GPU-accelerated computing.
04:08 · The Computing Gap
- Highlighting the massive future demand for computing power that CPUs alone cannot meet.
06:50 · NVIDIA Accelerated Computing Stack
- Explanation of NVIDIA’s full-stack optimization approach from architecture to applications.
13:20 · GPU-Accelerated HPC Cluster
- Comparing traditional CPU clusters to GPU clusters in cost, space, and power.
19:40 · AI Training Demand
- Showcasing the exponential growth in compute required for training neural networks.
23:20 · The Tensor Core GPU
- Introduction to the Volta architecture and its fusion of HPC and AI computing.
27:20 · NVSwitch
- Unveiling the high-speed interconnect that allows multiple GPUs to act as one.
29:30 · DGX-2 Announcement
- Introduction of the DGX-2, the world’s largest GPU.
35:20 · DGX-2 Physical Reveal
- Jensen Huang physically unveils the 350-pound DGX-2 system on stage.
37:30 · 10X Performance in 6 Months
- Demonstrating the rapid performance gains achieved through full-stack optimization.
44:30 · 5 Speed Records
- Highlighting record-setting AI training and inference performance metrics.
46:30 · AI Inference & TensorRT 4
- Focusing on the challenges of AI inference and the introduction of TensorRT 4.
57:00 · Kubernetes on NVIDIA GPUs
- Announcing GPU support for Kubernetes to scale out AI workloads.
01:01:00 · PLASTER Framework
- Introducing a framework for evaluating inference performance.
01:10:00 · Inference Demos
- Live demonstrations of image recognition and scale-out inference.
01:20:00 · HGX-2 Announcement
- Unveiling the HGX-2 cloud server platform for hyperscale data centers.
01:30:00 · NVIDIA RTX
- Introducing real-time ray tracing technology for computer graphics.
01:43:00 · NVIDIA Clara
- Announcing the Clara medical imaging supercomputer platform.
01:56:00 · NVIDIA Metropolis
- Discussing AI applications for smart and safe cities.
02:01:00 · NVIDIA DRIVE & Autonomous Vehicles
- Overview of the end-to-end platform for autonomous driving.
02:06:00 · Project We-kanda Demo
- A live VR telepresence driving demonstration.

Product Announcements (6)

[29:30] DGX-2
- The world’s largest GPU system, combining 16 Volta GPUs.
- specs: 2 PFLOPS, 512GB HBM2 memory, 10kW power, 350 lbs.
- availability: $399,000, available in Q3.
[46:30] TensorRT 4
- An optimizing compiler for deep learning inference.
- specs: Integrates with TensorFlow, ONNX, and accelerates various network types.
- availability: Not explicitly stated.
[57:00] Kubernetes on NVIDIA GPUs
- Container orchestration support for NVIDIA GPUs.
- specs: Allows scaling out AI workloads across data centers and clouds.
- availability: Not explicitly stated.
[01:20:00] HGX-2
- A cloud server platform baseboard.
- specs: Fuses HPC and AI computing, 2 PFLOPS, uses NVSwitch.
- availability: Not explicitly stated.
[01:30:00] NVIDIA RTX
- Real-time ray tracing technology.
- specs: Combines programmable shading, ray tracing, and AI.
- availability: Not explicitly stated.
[01:43:00] NVIDIA Clara
- A medical imaging supercomputer platform.
- specs: Virtualizes medical imaging instruments, uses iterative reconstruction and AI.
- availability: Not explicitly stated.

Specific Numbers (8)

Timestamp	Metric	Value	Context
01:25	Performance advance	100,000x	CPU performance advancement over 25 years before Moore’s Law slowed.
02:50	CUDA Developers	850,000	Number of CUDA developers globally.
05:45	Computing Demand	1,000 Exaflops	Estimated computing demand by the year 2028.
20:37	Compute Demand Increase	300,000x	Increase in compute required for AI training over a 5-year period (OpenAI data).
33:45	Performance	2 PFLOPS	Computing power of the DGX-2 system.
39:31	Price	$399,000	Cost of the DGX-2 system.
45:00	Training Speed	15,500 images/sec	DGX-2 record for ResNet-50 training.
45:20	Inference Latency	1.1 milliseconds	Record latency for ResNet-50 inference.

Benchmark Claims (3)

[40:30] DGX-2 vs Traditional Hyperscale Cluster: 1 DGX-2
- vs: 300 Dual-CPU Servers
- gain: 1/8 the cost, 1/60 the space, 1/18 the power.
[37:30] DGX-2 vs DGX-1 Training Time: 1.5 days
- vs: 15 days
- gain: 10x faster training in just 6 months of stack optimization.
[01:08:00] TensorRT 4 Inference Speedup: Up to 190x
- vs: CPU-only inference
- gain: 190x for Image/Video, 50x for NLP, 45x for Recommender systems.

Customer Stories (2)

[15:25] Quantum Chemist
- Used CUDA on consumer GeForce GPUs to run quantum chemistry simulations.
- outcome: Achieved massive speedups, allowing him to do his life’s work in his lifetime, describing it as a ‘time machine’.
[19:57] OpenAI
- Measured the amount of computation necessary to train state-of-the-art neural networks.
- outcome: Found a 300,000x increase in compute demand over 5 years.

Key Technologies (6)

CUDA: NVIDIA’s parallel computing platform and programming model.
Tensor Core: A specialized core that fuses HPC and AI computing, performing mixed-precision matrix math.
NVSwitch: A high-speed interconnect switch that allows multiple GPUs to communicate at massive bandwidth.
TensorRT: An optimizing compiler and runtime for deep learning inference.
Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications.
RTX: NVIDIA’s technology for real-time ray tracing, combining rasterization, ray tracing, and AI.

Demos Shown (5)

[01:10:00] Flower image recognition inference comparing CPU vs GPU performance.
- True
[01:16:00] Scale-out AI inference using Kubernetes to dynamically add GPU nodes to handle increased load.
- True
[01:38:00] Star Wars Reflections demo showcasing real-time ray tracing using RTX technology.
- True
[01:46:00] Clara medical imaging demo comparing CPU vs GPU iterative reconstruction of a CT scan.
- True
[02:01:00] Project We-kanda: A VR telepresence demo driving a miniature car and a real car remotely.
- True

Predictions / Commitments (3)

[04:57, 10 years] In the next 10 years, computing demand will be faster than 100 times.
[16:45, Future] Every single supercomputer in the future will be accelerated.
[01:51:50, Future] Everything that moves will be autonomous.

Companies Mentioned (5)

TSMC · Google (TensorFlow) · Quanta, Wistron, Foxconn, Inventec · Epic Games, ILM · GE Healthcare, Philips, Siemens, Canon

Notable Quotes (3)

The more you buy, the more you save. — Jensen Huang @ 14:38

We created for him a time machine. — Jensen Huang @ 16:14

There is a new law in town… if you optimize across the entire stack, the performance improvement you can achieve is incredibly fast. — Jensen Huang @ 38:12

Key Topics

GPU Computing · Moore's Law · Supercomputing · Deep Learning Training · Deep Learning Inference · Tensor Core · NVSwitch · DGX-2 · HGX-2 · TensorRT · Kubernetes · Real-time Ray Tracing · Medical Imaging · Autonomous Vehicles

Takeaways

CPU scaling has stalled, making GPU-accelerated computing essential for future performance gains.
NVIDIA is optimizing across the entire computing stack (chips, systems, software, algorithms) to deliver exponential speedups.
The DGX-2, powered by NVSwitch, acts as a single giant GPU to tackle massive AI training workloads.
TensorRT 4 and Kubernetes integration make NVIDIA GPUs highly efficient and scalable for AI inference in data centers.
NVIDIA RTX brings real-time ray tracing to computer graphics, revolutionizing content creation and gaming.
NVIDIA’s platforms are expanding into vertical domains like medical imaging (Clara) and autonomous machines (DRIVE).