From b108eedc04214ed604892d1368398bebfd6376c1 Mon Sep 17 00:00:00 2001
From: Jacob Kelly <jacobjinkelly@google.com>
Date: Mon, 18 Nov 2024 16:53:42 +0000
Subject: [PATCH] Add a table of H100 inference timings to docs/performance.md

PiperOrigin-RevId: 697644061
---
 docs/performance.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/docs/performance.md b/docs/performance.md
index 29a8a97..1f288e1 100644
--- a/docs/performance.md
+++ b/docs/performance.md
@@ -66,6 +66,17 @@ them for numerical accuracy and throughput efficiency:
 -   1 NVIDIA A100 (80 GB)
 -   1 NVIDIA H100 (80 GB)
 
+We compare compile-free inference timings of both configurations in the
+following table:
+
+Num Tokens | 1 A100 80 GB (seconds) | 1 H100 80 GB (seconds)
+:--------- | ---------------------: | ---------------------:
+1024       | 62                     | 34
+2048       | 275                    | 144
+3072       | 703                    | 367
+4096       | 1434                   | 774
+5120       | 2547                   | 1416
+
 ### Other Hardware Configurations
 
 #### NVIDIA A100 (40 GB)
-- 
GitLab