From b108eedc04214ed604892d1368398bebfd6376c1 Mon Sep 17 00:00:00 2001 From: Jacob Kelly <jacobjinkelly@google.com> Date: Mon, 18 Nov 2024 16:53:42 +0000 Subject: [PATCH] Add a table of H100 inference timings to docs/performance.md PiperOrigin-RevId: 697644061 --- docs/performance.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/performance.md b/docs/performance.md index 29a8a97..1f288e1 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -66,6 +66,17 @@ them for numerical accuracy and throughput efficiency: - 1 NVIDIA A100 (80 GB) - 1 NVIDIA H100 (80 GB) +We compare compile-free inference timings of both configurations in the +following table: + +Num Tokens | 1 A100 80 GB (seconds) | 1 H100 80 GB (seconds) +:--------- | ---------------------: | ---------------------: +1024 | 62 | 34 +2048 | 275 | 144 +3072 | 703 | 367 +4096 | 1434 | 774 +5120 | 2547 | 1416 + ### Other Hardware Configurations #### NVIDIA A100 (40 GB) -- GitLab