GTC Taiwan Jensen Keynote

Category: Taiwan Keynote · Year: 2018 · ▶ Watch

Switch language → 中文

Segments (22)

  • 00:00 · Introduction
    • Jensen Huang welcomes the audience to GTC Taiwan 2018.
  • 01:06 · The Rise of GPU Computing
    • Discussion on the end of Moore’s Law and the necessity of GPU-accelerated computing.
  • 04:08 · The Computing Gap
    • Highlighting the massive future demand for computing power that CPUs alone cannot meet.
  • 06:50 · NVIDIA Accelerated Computing Stack
    • Explanation of NVIDIA’s full-stack optimization approach from architecture to applications.
  • 13:20 · GPU-Accelerated HPC Cluster
    • Comparing traditional CPU clusters to GPU clusters in cost, space, and power.
  • 19:40 · AI Training Demand
    • Showcasing the exponential growth in compute required for training neural networks.
  • 23:20 · The Tensor Core GPU
    • Introduction to the Volta architecture and its fusion of HPC and AI computing.
  • 27:20 · NVSwitch
    • Unveiling the high-speed interconnect that allows multiple GPUs to act as one.
  • 29:30 · DGX-2 Announcement
    • Introduction of the DGX-2, the world’s largest GPU.
  • 35:20 · DGX-2 Physical Reveal
    • Jensen Huang physically unveils the 350-pound DGX-2 system on stage.
  • 37:30 · 10X Performance in 6 Months
    • Demonstrating the rapid performance gains achieved through full-stack optimization.
  • 44:30 · 5 Speed Records
    • Highlighting record-setting AI training and inference performance metrics.
  • 46:30 · AI Inference & TensorRT 4
    • Focusing on the challenges of AI inference and the introduction of TensorRT 4.
  • 57:00 · Kubernetes on NVIDIA GPUs
    • Announcing GPU support for Kubernetes to scale out AI workloads.
  • 01:01:00 · PLASTER Framework
    • Introducing a framework for evaluating inference performance.
  • 01:10:00 · Inference Demos
    • Live demonstrations of image recognition and scale-out inference.
  • 01:20:00 · HGX-2 Announcement
    • Unveiling the HGX-2 cloud server platform for hyperscale data centers.
  • 01:30:00 · NVIDIA RTX
    • Introducing real-time ray tracing technology for computer graphics.
  • 01:43:00 · NVIDIA Clara
    • Announcing the Clara medical imaging supercomputer platform.
  • 01:56:00 · NVIDIA Metropolis
    • Discussing AI applications for smart and safe cities.
  • 02:01:00 · NVIDIA DRIVE & Autonomous Vehicles
    • Overview of the end-to-end platform for autonomous driving.
  • 02:06:00 · Project We-kanda Demo
    • A live VR telepresence driving demonstration.

Product Announcements (6)

  • [29:30] DGX-2
    • The world’s largest GPU system, combining 16 Volta GPUs.
    • specs: 2 PFLOPS, 512GB HBM2 memory, 10kW power, 350 lbs.
    • availability: $399,000, available in Q3.
  • [46:30] TensorRT 4
    • An optimizing compiler for deep learning inference.
    • specs: Integrates with TensorFlow, ONNX, and accelerates various network types.
    • availability: Not explicitly stated.
  • [57:00] Kubernetes on NVIDIA GPUs
    • Container orchestration support for NVIDIA GPUs.
    • specs: Allows scaling out AI workloads across data centers and clouds.
    • availability: Not explicitly stated.
  • [01:20:00] HGX-2
    • A cloud server platform baseboard.
    • specs: Fuses HPC and AI computing, 2 PFLOPS, uses NVSwitch.
    • availability: Not explicitly stated.
  • [01:30:00] NVIDIA RTX
    • Real-time ray tracing technology.
    • specs: Combines programmable shading, ray tracing, and AI.
    • availability: Not explicitly stated.
  • [01:43:00] NVIDIA Clara
    • A medical imaging supercomputer platform.
    • specs: Virtualizes medical imaging instruments, uses iterative reconstruction and AI.
    • availability: Not explicitly stated.

Specific Numbers (8)

Timestamp Metric Value Context
01:25 Performance advance 100,000x CPU performance advancement over 25 years before Moore’s Law slowed.
02:50 CUDA Developers 850,000 Number of CUDA developers globally.
05:45 Computing Demand 1,000 Exaflops Estimated computing demand by the year 2028.
20:37 Compute Demand Increase 300,000x Increase in compute required for AI training over a 5-year period (OpenAI data).
33:45 Performance 2 PFLOPS Computing power of the DGX-2 system.
39:31 Price $399,000 Cost of the DGX-2 system.
45:00 Training Speed 15,500 images/sec DGX-2 record for ResNet-50 training.
45:20 Inference Latency 1.1 milliseconds Record latency for ResNet-50 inference.

Benchmark Claims (3)

  • [40:30] DGX-2 vs Traditional Hyperscale Cluster: 1 DGX-2
    • vs: 300 Dual-CPU Servers
    • gain: 1/8 the cost, 1/60 the space, 1/18 the power.
  • [37:30] DGX-2 vs DGX-1 Training Time: 1.5 days
    • vs: 15 days
    • gain: 10x faster training in just 6 months of stack optimization.
  • [01:08:00] TensorRT 4 Inference Speedup: Up to 190x
    • vs: CPU-only inference
    • gain: 190x for Image/Video, 50x for NLP, 45x for Recommender systems.

Customer Stories (2)

  • [15:25] Quantum Chemist
    • Used CUDA on consumer GeForce GPUs to run quantum chemistry simulations.
    • outcome: Achieved massive speedups, allowing him to do his life’s work in his lifetime, describing it as a ‘time machine’.
  • [19:57] OpenAI
    • Measured the amount of computation necessary to train state-of-the-art neural networks.
    • outcome: Found a 300,000x increase in compute demand over 5 years.

Key Technologies (6)

  • CUDA: NVIDIA’s parallel computing platform and programming model.
  • Tensor Core: A specialized core that fuses HPC and AI computing, performing mixed-precision matrix math.
  • NVSwitch: A high-speed interconnect switch that allows multiple GPUs to communicate at massive bandwidth.
  • TensorRT: An optimizing compiler and runtime for deep learning inference.
  • Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications.
  • RTX: NVIDIA’s technology for real-time ray tracing, combining rasterization, ray tracing, and AI.

Demos Shown (5)

  • [01:10:00] Flower image recognition inference comparing CPU vs GPU performance.
    • True
  • [01:16:00] Scale-out AI inference using Kubernetes to dynamically add GPU nodes to handle increased load.
    • True
  • [01:38:00] Star Wars Reflections demo showcasing real-time ray tracing using RTX technology.
    • True
  • [01:46:00] Clara medical imaging demo comparing CPU vs GPU iterative reconstruction of a CT scan.
    • True
  • [02:01:00] Project We-kanda: A VR telepresence demo driving a miniature car and a real car remotely.
    • True

Predictions / Commitments (3)

  • [04:57, 10 years] In the next 10 years, computing demand will be faster than 100 times.
  • [16:45, Future] Every single supercomputer in the future will be accelerated.
  • [01:51:50, Future] Everything that moves will be autonomous.

Companies Mentioned (5)

TSMC · Google (TensorFlow) · Quanta, Wistron, Foxconn, Inventec · Epic Games, ILM · GE Healthcare, Philips, Siemens, Canon

Notable Quotes (3)

The more you buy, the more you save. — Jensen Huang @ 14:38

We created for him a time machine. — Jensen Huang @ 16:14

There is a new law in town… if you optimize across the entire stack, the performance improvement you can achieve is incredibly fast. — Jensen Huang @ 38:12

Key Topics

GPU Computing · Moore's Law · Supercomputing · Deep Learning Training · Deep Learning Inference · Tensor Core · NVSwitch · DGX-2 · HGX-2 · TensorRT · Kubernetes · Real-time Ray Tracing · Medical Imaging · Autonomous Vehicles

Takeaways

  • CPU scaling has stalled, making GPU-accelerated computing essential for future performance gains.
  • NVIDIA is optimizing across the entire computing stack (chips, systems, software, algorithms) to deliver exponential speedups.
  • The DGX-2, powered by NVSwitch, acts as a single giant GPU to tackle massive AI training workloads.
  • TensorRT 4 and Kubernetes integration make NVIDIA GPUs highly efficient and scalable for AI inference in data centers.
  • NVIDIA RTX brings real-time ray tracing to computer graphics, revolutionizing content creation and gaming.
  • NVIDIA’s platforms are expanding into vertical domains like medical imaging (Clara) and autonomous machines (DRIVE).