diff --git a/.gitignore b/.gitignore
index 8dbad9d56955289549db067811f4ca2f99bc854f..eabb5601cd70f2249b91b1ee48174f32b2b2c19f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,4 +2,5 @@
 docs/_build/
 docs/build/
 docs/**/*.mo
-.vscode
\ No newline at end of file
+.vscode
+.idea
diff --git a/docs/source/benchmark/speed_benchmark.rst b/docs/source/benchmark/speed_benchmark.rst
index 7555ff1610c956ec7c9150174032328eb29d8d3e..297cf717e3ae22e90fd30903be3844937871e59e 100644
--- a/docs/source/benchmark/speed_benchmark.rst
+++ b/docs/source/benchmark/speed_benchmark.rst
@@ -1,455 +1,682 @@
-Speed Benchmark
+Qwen2.5 Speed Benchmark
 =========================
 
-.. attention:: 
-    To be updated for Qwen2.5.
 
 This section reports the speed performance of bf16 models, quantized models 
-(including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2 series. Specifically, we
+(including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2.5 series. Specifically, we
 report the inference speed (tokens/s) as well as memory footprint (GB)
 under the conditions of different context lengths.
 
 The environment of the evaluation with huggingface transformers is:
 
 -  NVIDIA A100 80GB
--  CUDA 11.8
--  Pytorch 2.1.2+cu118
--  Flash Attention 2.3.3
--  Transformers 4.38.2
--  AutoGPTQ 0.7.1
--  AutoAWQ 0.2.4
+-  CUDA 12.1
+-  torch==2.3.1
+-  flash_attn==2.5.8
+-  transformers==4.46.0
+-  auto_gptq==0.7.1+cu1210 (Compiled from source code)
+-  autoawq==0.2.6
+
 
 The environment of the evaluation with vLLM is:
 
 -  NVIDIA A100 80GB
--  CUDA 11.8
--  Pytorch 2.3.0+cu118
--  Flash Attention 2.5.6
--  Transformers 4.40.1
--  vLLM 0.4.2
+-  CUDA 12.1
+-  vllm==0.6.3
+-  torch==2.4.0
+-  flash_attn==2.6.3
+-  transformers==4.46.0
+
 
-Note:
+Notes:
 
 - We use the batch size of 1 and the least number of GPUs as
-  possible for the evalution.
+  possible for the evaluation.
 - We test the speed and memory of generating 2048 tokens with 
   the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 
-  tokens (\>32k is only avaliable for Qwen2-72B-Instuct and Qwen2-7B-Instuct).
+  tokens.
 - For vLLM, the memory usage is not reported because it pre-allocates
   all GPU memory. We use ``gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False``
   by default.
 
 
+
 -  0.5B (Transformer)
 
-+---------------------+--------------+--------------+---------+-----------------+----------------+
-| Model               | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
-+=====================+==============+==============+=========+=================+================+
-| Qwen2-0.5B-Instruct | 1            | BF16         | 1       | 49.94           | 1.17           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 36.35           | 0.85           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 49.56           | 0.68           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 38.78           | 0.68           |
-+                     +--------------+--------------+---------+-----------------+----------------+
-|                     | 6144         | BF16         | 1       | 50.83           | 6.42           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 36.56           | 6.09           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 49.63           | 5.93           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 38.73           | 5.92           |
-+                     +--------------+--------------+---------+-----------------+----------------+
-|                     | 14336        | BF16         | 1       | 49.56           | 13.48          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 36.23           | 13.15          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 48.68           | 12.97          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 38.94           | 12.99          |
-+                     +--------------+--------------+---------+-----------------+----------------+
-|                     | 30720        | BF16         | 1       | 49.25           | 27.61          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 34.61           | 27.28          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 48.18           | 27.12          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 38.19           | 27.11          |
-+---------------------+--------------+--------------+---------+-----------------+----------------+
++-------------------------+--------------+--------------+---------+-----------------+----------------+
+| Model                   | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
++=========================+==============+==============+=========+=================+================+
+| Qwen2.5-0.5B-Instruct   | 1            | BF16         | 1       | 47.40           | 0.97           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int8    | 1       | 35.17           | 0.64           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int4    | 1       | 50.60           | 0.48           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | AWQ          | 1       | 37.09           | 0.68           |
++                         +--------------+--------------+---------+-----------------+----------------+
+|                         | 6144         | BF16         | 1       | 47.45           | 1.23           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int8    | 1       | 36.47           | 0.90           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int4    | 1       | 48.89           | 0.73           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | AWQ          | 1       | 37.04           | 0.72           |
++                         +--------------+--------------+---------+-----------------+----------------+
+|                         | 14336        | BF16         | 1       | 47.11           | 1.60           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int8    | 1       | 35.44           | 1.26           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int4    | 1       | 48.26           | 1.10           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | AWQ          | 1       | 37.14           | 1.10           |
++                         +--------------+--------------+---------+-----------------+----------------+
+|                         | 30720        | BF16         | 1       | 47.16           | 2.34           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int8    | 1       | 36.25           | 2.01           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | GPTQ-Int4    | 1       | 49.22           | 1.85           |
++                         +              +--------------+---------+-----------------+----------------+
+|                         |              | AWQ          | 1       | 36.90           | 1.84           |
++-------------------------+--------------+--------------+---------+-----------------+----------------+
 
--  0.5B (vLLM)
 
-+---------------------+--------------+--------------+---------+-----------------+
-| Model               | Input Length | Quantization | GPU Num | Speed(tokens/s) |
-+=====================+==============+==============+=========+=================+
-| Qwen2-0.5B-Instruct | 1            | BF16         | 1       | 270.49          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 235.95          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 240.07          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 233.31          |
-+                     +--------------+--------------+---------+-----------------+
-|                     | 6144         | BF16         | 1       | 256.16          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 224.30          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 226.41          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 222.83          |
-+                     +--------------+--------------+---------+-----------------+
-|                     | 14336        | BF16         | 1       | 108.89          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 108.10          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 106.51          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 104.16          |
-+                     +--------------+--------------+---------+-----------------+
-|                     | 30720        | BF16         | 1       | 97.20           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 94.49           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 93.94           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 92.23           |
-+---------------------+--------------+--------------+---------+-----------------+
 
+-  0.5B (vLLM)
 
--  1.5B (Transformer)
++-------------------------+--------------+--------------+---------+-----------------+
+| Model                   | Input Length | Quantization | GPU Num | Speed(tokens/s) |
++=========================+==============+==============+=========+=================+
+| Qwen2.5-0.5B-Instruct   | 1            | BF16         | 1       | 311.55          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int8    | 1       | 257.07          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int4    | 1       | 260.93          |
++                         +              +--------------+---------+-----------------+
+|                         |              | AWQ          | 1       | 261.95          |
++                         +--------------+--------------+---------+-----------------+
+|                         | 6144         | BF16         | 1       | 304.79          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int8    | 1       | 254.10          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int4    | 1       | 257.33          |
++                         +              +--------------+---------+-----------------+
+|                         |              | AWQ          | 1       | 259.80          |
++                         +--------------+--------------+---------+-----------------+
+|                         | 14336        | BF16         | 1       | 290.28          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int8    | 1       | 243.69          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int4    | 1       | 247.01          |
++                         +              +--------------+---------+-----------------+
+|                         |              | AWQ          | 1       | 249.58          |
++                         +--------------+--------------+---------+-----------------+
+|                         | 30720        | BF16         | 1       | 264.51          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int8    | 1       | 223.86          |
++                         +              +--------------+---------+-----------------+
+|                         |              | GPTQ-Int4    | 1       | 226.50          |
++                         +              +--------------+---------+-----------------+
+|                         |              | AWQ          | 1       | 229.84          |
++-------------------------+--------------+--------------+---------+-----------------+
 
-+---------------------+--------------+--------------+---------+-----------------+----------------+
-| Model               | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
-+=====================+==============+==============+=========+=================+================+
-| Qwen2-1.5B-Instruct | 1            | BF16         | 1       | 40.89           | 3.44           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 31.51           | 2.31           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 42.47           | 1.67           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 33.62           | 1.64           |
-+                     +--------------+--------------+---------+-----------------+----------------+
-|                     | 6144         | BF16         | 1       | 40.86           | 8.74           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 31.31           | 7.59           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 42.78           | 6.95           |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 32.90           | 6.92           |
-+                     +--------------+--------------+---------+-----------------+----------------+
-|                     | 14336        | BF16         | 1       | 40.08           | 15.92          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 31.19           | 14.79          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 42.25           | 14.14          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 33.24           | 14.12          |
-+                     +--------------+--------------+---------+-----------------+----------------+
-|                     | 30720        | BF16         | 1       | 34.09           | 30.31          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int8    | 1       | 28.52           | 29.18          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | GPTQ-Int4    | 1       | 31.30           | 28.54          |
-+                     +              +--------------+---------+-----------------+----------------+
-|                     |              | AWQ          | 1       | 32.16           | 28.51          |
-+---------------------+--------------+--------------+---------+-----------------+----------------+
 
--  1.5B (vLLM)
 
-+---------------------+--------------+--------------+---------+-----------------+
-| Model               | Input Length | Quantization | GPU Num | Speed(tokens/s) |
-+=====================+==============+==============+=========+=================+
-| Qwen2-1.5B-Instruct | 1            | BF16         | 1       | 175.55          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 172.28          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 184.58          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 170.87          |
-+                     +--------------+--------------+---------+-----------------+
-|                     | 6144         | BF16         | 1       | 166.23          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 164.32          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 174.04          |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 162.81          |
-+                     +--------------+--------------+---------+-----------------+
-|                     | 14336        | BF16         | 1       | 83.67           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 98.63           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 97.65           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 92.48           |
-+                     +--------------+--------------+---------+-----------------+
-|                     | 30720        | BF16         | 1       | 77.69           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int8    | 1       | 86.42           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | GPTQ-Int4    | 1       | 87.49           |
-+                     +              +--------------+---------+-----------------+
-|                     |              | AWQ          | 1       | 82.88           |
-+---------------------+--------------+--------------+---------+-----------------+
+-  1.5B (Transformer)
 
++--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
++==========================+==============+==============+=========+=================+================+=========================+
+| Qwen2.5-1.5B-Instruct    | 1            | BF16         | 1       | 39.68           | 2.95           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 32.62           | 1.82           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 43.33           | 1.18           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 31.70           | 1.51           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 6144         | BF16         | 1       | 40.88           | 3.43           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 31.46           | 2.30           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 43.96           | 1.66           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 32.30           | 1.63           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 14336        | BF16         | 1       | 40.43           | 4.16           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 31.06           | 3.03           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 43.66           | 2.39           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 32.39           | 2.36           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 30720        | BF16         | 1       | 38.59           | 5.62           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 31.04           | 4.49           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 35.68           | 3.85           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 31.95           | 3.82           |                         |
++--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
 
--  7B (Transformer)
 
-+-------------------+--------------+--------------+---------+-----------------+----------------+
-| Model             | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
-+===================+==============+==============+=========+=================+================+
-| Qwen2-7B-Instruct | 1            | BF16         | 1       | 37.97           | 14.92          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int8    | 1       | 30.85           | 8.97           |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int4    | 1       | 36.17           | 6.06           |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | AWQ          | 1       | 33.08           | 5.93           |
-+                   +--------------+--------------+---------+-----------------+----------------+
-|                   | 6144         | BF16         | 1       | 34.74           | 20.26          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int8    | 1       | 31.13           | 14.31          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int4    | 1       | 33.34           | 11.40          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | AWQ          | 1       | 30.86           | 11.27          |
-+                   +--------------+--------------+---------+-----------------+----------------+
-|                   | 14336        | BF16         | 1       | 26.63           | 27.71          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int8    | 1       | 24.58           | 21.76          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int4    | 1       | 25.81           | 18.86          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | AWQ          | 1       | 27.61           | 18.72          |
-+                   +--------------+--------------+---------+-----------------+----------------+
-|                   | 30720        | BF16         | 1       | 17.49           | 42.62          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int8    | 1       | 16.69           | 36.67          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | GPTQ-Int4    | 1       | 17.17           | 33.76          |
-+                   +              +--------------+---------+-----------------+----------------+
-|                   |              | AWQ          | 1       | 17.87           | 33.63          |
-+-------------------+--------------+--------------+---------+-----------------+----------------+
+-  1.5B (vLLM)
 
++--------------------------+--------------+--------------+---------+-----------------+
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) |
++==========================+==============+==============+=========+=================+
+| Qwen2.5-1.5B-Instruct    | 1            | BF16         | 1       | 183.33          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 201.67          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 217.03          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 213.74          |
++                          +--------------+--------------+---------+-----------------+
+|                          | 6144         | BF16         | 1       | 176.68          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 192.83          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 206.63          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 203.64          |
++                          +--------------+--------------+---------+-----------------+
+|                          | 14336        | BF16         | 1       | 168.69          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 183.69          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 195.88          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 192.64          |
++                          +--------------+--------------+---------+-----------------+
+|                          | 30720        | BF16         | 1       | 152.04          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 162.82          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 173.57          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 170.20          |
++--------------------------+--------------+--------------+---------+-----------------+
 
--  7B (vLLM)
 
-+-------------------+--------------+--------------+---------+-----------------+
-| Model             | Input Length | Quantization | GPU Num | Speed(tokens/s) |
-+===================+==============+==============+=========+=================+
-| Qwen2-7B-Instruct | 1            | BF16         | 1       | 80.45           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int8    | 1       | 114.32          |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int4    | 1       | 143.40          |
-+                   +              +--------------+---------+-----------------+
-|                   |              | AWQ          | 1       | 96.65           |
-+                   +--------------+--------------+---------+-----------------+
-|                   | 6144         | BF16         | 1       | 76.41           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int8    | 1       | 107.02          |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int4    | 1       | 131.55          |
-+                   +              +--------------+---------+-----------------+
-|                   |              | AWQ          | 1       | 91.38           |
-+                   +--------------+--------------+---------+-----------------+
-|                   | 14336        | BF16         | 1       | 66.54           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int8    | 1       | 89.72           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int4    | 1       | 97.93           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | AWQ          | 1       | 76.87           |
-+                   +--------------+--------------+---------+-----------------+
-|                   | 30720        | BF16         | 1       | 55.83           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int8    | 1       | 71.58           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int4    | 1       | 81.48           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | AWQ          | 1       | 63.62           |
-+                   +--------------+--------------+---------+-----------------+
-|                   | 63488        | BF16         | 1       | 41.20           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int8    | 1       | 49.37           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int4    | 1       | 54.12           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | AWQ          | 1       | 45.89           |
-+                   +--------------+--------------+---------+-----------------+
-|                   | 129024       | BF16         | 1       | 25.01           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int8    | 1       | 27.73           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | GPTQ-Int4    | 1       | 29.39           |
-+                   +              +--------------+---------+-----------------+
-|                   |              | AWQ          | 1       | 27.13           |
-+-------------------+--------------+--------------+---------+-----------------+
-
-
-- 57B-A14B (Transformer)
-
-+--------------------------+--------------+--------------+---------+-----------------+----------------+
-| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
-+==========================+==============+==============+=========+=================+================+
-| Qwen2-57B-A14B-Instruct  | 1            | BF16         | 2       | 4.76            | 110.29         |
-+                          +              +--------------+---------+-----------------+----------------+
-|                          |              | GPTQ-Int4    | 1       | 5.55            | 30.38          |
-+                          +--------------+--------------+---------+-----------------+----------------+
-|                          | 6144         | BF16         | 2       | 4.90            | 117.80         |
-+                          +              +--------------+---------+-----------------+----------------+
-|                          |              | GPTQ-Int4    | 1       | 5.44            | 35.67          |
-+                          +--------------+--------------+---------+-----------------+----------------+
-|                          | 14336        | BF16         | 2       | 4.58            | 128.17         |
-+                          +              +--------------+---------+-----------------+----------------+
-|                          |              | GPTQ-Int4    | 1       | 5.31            | 43.11          |
-+                          +--------------+--------------+---------+-----------------+----------------+
-|                          | 30720        | BF16         | 2       | 4.12            | 163.77         |
-+                          +              +--------------+---------+-----------------+----------------+
-|                          |              | GPTQ-Int4    | 1       | 4.72            | 58.01          |
-+--------------------------+--------------+--------------+---------+-----------------+----------------+
-
-- 57B-A14B (vLLM)
+
+-  3B (Transformer)
+
++--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
++==========================+==============+==============+=========+=================+================+=========================+
+| Qwen2.5-3B-Instruct      | 1            | BF16         | 1       | 30.80           | 5.95           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 25.69           | 3.38           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 35.21           | 2.06           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 25.29           | 2.50           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 6144         | BF16         | 1       | 32.20           | 6.59           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 24.69           | 3.98           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 34.47           | 2.67           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 24.86           | 2.62           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 14336        | BF16         | 1       | 31.72           | 7.47           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 24.70           | 4.89           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 34.36           | 3.58           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 25.19           | 3.54           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 30720        | BF16         | 1       | 25.37           | 9.30           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 21.67           | 6.72           | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 23.60           | 5.41           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 24.56           | 5.37           |                         |
++--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+
+
+-  3B (vLLM)
 
 +--------------------------+--------------+--------------+---------+-----------------+
 | Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) |
 +==========================+==============+==============+=========+=================+
-| Qwen2-57B-A14B-Instruct  | 1            | BF16         | 2       | 31.44           |
-+--------------------------+--------------+--------------+---------+-----------------+
-|                          | 6144         | BF16         | 2       | 31.77           |
-+--------------------------+--------------+--------------+---------+-----------------+
-|                          | 14336        | BF16         | 2       | 21.25           |
-+--------------------------+--------------+--------------+---------+-----------------+
-|                          | 30720        | BF16         | 2       | 20.24           |
+| Qwen2.5-3B-Instruct      | 1            | BF16         | 1       | 127.61          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 150.02          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 168.20          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 165.50          |
++                          +--------------+--------------+---------+-----------------+
+|                          | 6144         | BF16         | 1       | 123.15          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 143.09          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 159.85          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 156.38          |
++                          +--------------+--------------+---------+-----------------+
+|                          | 14336        | BF16         | 1       | 117.35          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 135.50          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 149.35          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 147.75          |
++                          +--------------+--------------+---------+-----------------+
+|                          | 30720        | BF16         | 1       | 105.88          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int8    | 1       | 118.38          |
++                          +              +--------------+---------+-----------------+
+|                          |              | GPTQ-Int4    | 1       | 129.28          |
++                          +              +--------------+---------+-----------------+
+|                          |              | AWQ          | 1       | 127.19          |
 +--------------------------+--------------+--------------+---------+-----------------+
 
-Note: Compared with dense models, MOE models have larger throughput when batch size is large, which is shown as follows:
 
-+--------------------------+--------------+-------------+------+----------+
-| Model                    | Quantization | # Prompts   | QPS  | Tokens/s |
-+==========================+==============+=============+======+==========+
-| Qwen1.5-32B-Chat         | BF16         | 100         | 6.68 | 7343.56  |
-+--------------------------+--------------+-------------+------+----------+
-| Qwen2-57B-A14B-Instruct  | BF16         | 100         | 4.81 | 5291.15  |
-+--------------------------+--------------+-------------+------+----------+
-| Qwen1.5-32B-Chat         | BF16         | 1000        | 7.99 | 8791.35  |
-+--------------------------+--------------+-------------+------+----------+
-| Qwen2-57B-A14B-Instruct  | BF16         | 1000        | 5.18 | 5698.37  |
-+--------------------------+--------------+-------------+------+----------+
 
-The results are obtained from vLLM throughput benchmarking scripts, which can be reproduced by:
+-  7B (Transformer)
+
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
++=============================+==============+==============+=========+=================+================+=========================+
+| Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 40.38           | 14.38          |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int8    | 1       | 31.55           | 8.42           | auto_gptq==0.6.0+cu1210 |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int4    | 1       | 43.10           | 5.52           |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | AWQ          | 1       | 32.03           | 5.39           |                         |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                             | 6144         | BF16         | 1       | 38.76           | 15.38          |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int8    | 1       | 31.26           | 9.43           | auto_gptq==0.6.0+cu1210 |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int4    | 1       | 38.27           | 6.52           |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | AWQ          | 1       | 32.37           | 6.39           |                         |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                             | 14336        | BF16         | 1       | 29.78           | 16.91          |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int8    | 1       | 26.86           | 10.96          | auto_gptq==0.6.0+cu1210 |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int4    | 1       | 28.70           | 8.05           |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | AWQ          | 1       | 30.23           | 7.92           |                         |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                             | 30720        | BF16         | 1       | 18.83           | 19.97          |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int8    | 1       | 17.59           | 14.01          | auto_gptq==0.6.0+cu1210 |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | GPTQ-Int4    | 1       | 18.45           | 11.11          |                         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------+
+|                             |              | AWQ          | 1       | 19.11           | 10.98          |                         |
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+
+
+
+-  7B (vLLM)
+
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB)| Note                                      |
++=============================+==============+==============+=========+=================+================+===========================================+
+| Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 84.28           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 122.01          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 154.05          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 148.10          |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 80.70           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 112.38          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 141.98          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 137.64          |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 77.69           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 105.25          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 129.35          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 124.91          |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 1       | 70.33           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 90.71           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 108.30          |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 104.66          |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 63488        | BF16         | 1       | 50.86           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 60.52           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 67.97           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 66.42           |                | setting-64k                               |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 129024       | BF16         | 1       | 28.94           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 25.97           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 26.37           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 26.57           |                | vllm==0.6.2, new sample config            |
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+  * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+
+- 14B (Transformer)
+
++--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
++==========================+==============+==============+=========+=================+================+=========================+
+| Qwen2.5-14B-Instruct     | 1            | BF16         | 1       | 24.74           | 28.08          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 18.84           | 16.11          | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 25.89           | 9.94           |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 19.23           | 9.79           |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 6144         | BF16         | 1       | 20.51           | 29.50          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 17.80           | 17.61          | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 20.06           | 11.36          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 19.21           | 11.22          |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 14336        | BF16         | 1       | 13.92           | 31.95          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 12.66           | 19.98          | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 13.79           | 13.81          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 14.17           | 13.67          |                         |
++                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
+|                          | 30720        | BF16         | 1       | 8.20            | 36.85          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int8    | 1       | 7.77            | 24.88          | auto_gptq==0.6.0+cu1210 |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | GPTQ-Int4    | 1       | 8.14            | 18.71          |                         |
++                          +              +--------------+---------+-----------------+----------------+-------------------------+
+|                          |              | AWQ          | 1       | 8.31            | 18.57          |                         |
++--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
+
+
+- 14B (vLLM)
+
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB)| Note                                      |
++=============================+==============+==============+=========+=================+================+===========================================+
+| Qwen2.5-14B-Instruct        | 1            | BF16         | 1       | 46.30           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 70.40           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 98.02           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 92.66           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 43.83           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 64.33           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 86.10           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 83.11           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 41.91           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 59.21           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 76.85           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 74.03           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 1       | 37.18           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 49.23           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 60.91           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 59.01           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 63488        | BF16         | 1       | 26.85           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 32.83           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 37.67           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 36.71           |                | setting-64k                               |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 129024       | BF16         | 1       | 14.53           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 15.10           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 15.13           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 15.25           |                | vllm==0.6.2, new sample config            |
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+  * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+
+
+
+- 32B (Transformer)
+
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
++=============================+==============+==============+=========+=================+================+===========================================+
+| Qwen2.5-32B-Instruct        | 1            | BF16         | 1       | 17.54           | 61.58          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 14.52           | 33.56          | auto_gptq==0.6.0+cu1210                   |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 19.20           | 18.94          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 14.60           | 18.67          |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 12.49           | 63.72          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 11.61           | 35.86          | auto_gptq==0.6.0+cu1210                   |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 13.42           | 21.09          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 13.81           | 20.81          |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 8.95            | 67.31          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 8.53            | 39.28          | auto_gptq==0.6.0+cu1210                   |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 9.48            | 24.67          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 9.71            | 24.39          |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 1       | 5.59            | 74.47          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 5.42            | 46.45          | auto_gptq==0.6.0+cu1210                   |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 5.79            | 31.84          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 5.85            | 31.56          |                                           |
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+
+
+
+
+
+- 32B (vLLM)
+
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
++=============================+==============+==============+=========+=================+================+===========================================+
+| Qwen2.5-32B-Instruct        | 1            | BF16         | 1       | 22.13           |                | setting1                                  |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 37.57           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 55.83           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 51.92           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 21.05           |                | setting1                                  |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 34.67           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 49.96           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 46.68           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 19.91           |                | setting1                                  |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 31.89           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 44.79           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 41.83           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 2       | 31.82           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 26.88           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 35.66           |                |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 33.75           |                |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 63488        | BF16         | 2       | 24.45           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 18.60           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 22.72           |                | setting-64k                               |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 21.79           |                | setting-64k                               |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 129024       | BF16         | 2       | 14.31           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 9.77            |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 10.39           |                | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 10.34           |                | vllm==0.6.2, new sample config            |
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+
+  * For context length 129024, the model needs to be predicted with the following config: "model_max_length"=131072
+  * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)
+  * [Setting 1]=(gpu_memory_utilization=1.0 max_model_len=32768 enforce_eager=True)
+  * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+
 
-``python vllm/benchmarks/benchmark_throughput.py --input-len 1000 --output-len 100 --model <model_path> --num-prompts <number of prompts> --enforce-eager -tp 2``
 
 - 72B (Transformer)
 
-+--------------------+--------------+--------------+---------+-----------------+----------------+
-| Model              | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
-+====================+==============+==============+=========+=================+================+
-| Qwen2-72B-Instruct | 1            | BF16         | 2       | 7.45            | 134.74         |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 7.30            | 71.00          |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 1       | 9.05            | 41.80          |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 1       | 9.96            | 41.31          |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 6144         | BF16         | 2       | 5.99            | 144.38         |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 5.93            | 80.60          |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 1       | 6.79            | 47.90          |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 1       | 7.49            | 47.42          |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 14336        | BF16         | 3       | 4.12            | 169.93         |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 4.43            | 95.14          |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 1       | 4.87            | 57.79          |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 1       | 5.23            | 57.30          |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 30720        | BF16         | 3       | 2.86            | 209.03         |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 2.83            | 124.20         |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 3.02            | 107.94         |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 1.85            | 88.60          |
-+--------------------+--------------+--------------+---------+-----------------+----------------+
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
++=============================+==============+==============+=========+=================+================+===========================================+
+| Qwen2.5-72B-Instruct        | 1            | BF16         | 2       | 8.73            | 136.20         |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 2       | 8.66            | 72.61          |           auto_gptq==0.6.0+cu1210         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 11.07           | 39.91          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 11.50           | 39.44          |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 2       | 6.39            | 140.00         |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 2       | 6.39            | 77.81          |           auto_gptq==0.6.0+cu1210         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 7.56            | 42.50          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 8.17            | 42.13          |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 3       | 4.25            | 149.14         |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 2       | 4.66            | 82.55          |           auto_gptq==0.6.0+cu1210         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 5.27            | 46.86          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 5.57            | 46.38          |                                           |
++                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 3       | 2.94            | 164.79         |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 2       | 2.94            | 94.75          |           auto_gptq==0.6.0+cu1210         |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 2       | 3.14            | 62.57          |                                           |
++                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                             |              | AWQ          | 2       | 3.23            | 61.64          |                                           |
++-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+
+
 
 
 - 72B (vLLM)
 
-+--------------------+--------------+--------------+---------+-----------------+----------------+
-| Model              | Input Length | Quantization | GPU Num | Speed(tokens/s) | Setting        |
-+====================+==============+==============+=========+=================+================+
-| Qwen2-72B-Instruct | 1            | BF16         | 2       | 17.68           | [Setting 1]    |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | BF16         | 4       | 30.01           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 27.56           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 1       | 29.60           | [Setting 2]    |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 42.82           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 27.73           | -              |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 6144         | BF16         | 4       | 27.98           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 25.46           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 1       | 25.16           | [Setting 3]    |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 38.23           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 25.77           | -              |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 14336        | BF16         | 4       | 21.81           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 22.71           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 26.54           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 21.50           | -              |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 30720        | BF16         | 4       | 19.43           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 18.69           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 23.12           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 18.09           | -              |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 30720        | BF16         | 4       | 19.43           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 18.69           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 23.12           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 18.09           | -              |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 63488        | BF16         | 4       | 17.46           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 2       | 15.30           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 13.23           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 13.14           | -              |
-+                    +--------------+--------------+---------+-----------------+----------------+
-|                    | 129024       | BF16         | 4       | 11.70           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int8    | 4       | 12.94           | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | GPTQ-Int4    | 2       | 8.33            | -              |
-+                    +              +--------------+---------+-----------------+----------------+
-|                    |              | AWQ          | 2       | 7.78            | -              |
-+--------------------+--------------+--------------+---------+-----------------+----------------+
++------------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+| Model                        | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
++==============================+==============+==============+=========+=================+================+===========================================+
+| Qwen2.5-72B-Instruct         | 1            | BF16         | 2       | 18.19           |                | Setting 1                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | BF16         | 4       | 31.37           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 31.40           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 1       | 16.47           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 46.30           |                | Setting 2                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 44.30           |                | Default                                   |
++                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              | 6144         | BF16         | 4       | 29.90           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 29.37           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 1       | 13.88           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 42.50           |                | Setting 3                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 40.67           |                | Default                                   |
++                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              | 14336        | BF16         | 4       | 30.10           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 27.20           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 38.10           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 36.63           |                | Default                                   |
++                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              | 30720        | BF16         | 4       | 27.53           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 23.32           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 30.98           |                | Default                                   |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 30.02           |                | Default                                   |
++                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              | 63488        | BF16         | 4       | 20.74           |                | Setting 4                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 16.27           |                | Setting 4                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 19.84           |                | Setting 4                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 19.32           |                | Setting 4                                 |
++                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              | 129024       | BF16         | 4       | 12.68           |                | Setting 5                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 4       | 14.11           |                | Setting 5                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 10.11           |                | Setting 5                                 |
++                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 9.88            |                | Setting 5                                 |
++------------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
 
   * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)
   * [Setting 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)
   * [Setting 2]=(gpu_memory_utilization=1.0 max_model_len=4096 enforce_eager=True)
   * [Setting 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)
+  * [Setting 4]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+  * [Setting 5]=(gpu_memory_utilization=0.9 max_model_len=131072 enforce_eager=False)