diff --git a/README.md b/README.md
index 4181bb01de9c8c140932e2f4f77ed906b9a55907..7f6e35c214fafbcd9f8237434582608f6147922b 100644
--- a/README.md
+++ b/README.md
@@ -22,7 +22,7 @@ To learn more about Qwen2.5, feel free to read our documentation \[[EN](https://
 - Quantization: the practice of quantizing LLMs with GPTQ, AWQ, as well as the guidance for how to make high-quality quantized GGUF files;
 - Training: the instructions for post-training, including SFT and RLHF (TODO) with frameworks like Axolotl, LLaMA-Factory, etc.
 - Framework: the usage of Qwen with frameworks for application, e.g., RAG, Agent, etc.
-- Benchmark: the statistics about inference speed and memory footprint (to be updated for Qwen2.5).
+- Benchmark: the statistics about inference speed and memory footprint (Available for Qwen2.5).
 
 ## Introduction
 
@@ -37,7 +37,7 @@ In the past three months since Qwen2's release, numerous developers have built n
 
 ## News
 
-- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more! 
+- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more!
 - 2024.06.06: We released the Qwen2 series. Check our [blog](https://qwenlm.github.io/blog/qwen2/)!
 - 2024.03.28: We released the first MoE model of Qwen: Qwen1.5-MoE-A2.7B! Temporarily, only HF transformers and vLLM support the model. We will soon add the support of llama.cpp, mlx-lm, etc. Check our [blog](https://qwenlm.github.io/blog/qwen-moe/) for more information!
 - 2024.02.05: We released the Qwen1.5 series.
@@ -46,7 +46,7 @@ In the past three months since Qwen2's release, numerous developers have built n
 
 Detailed evaluation results are reported in this <a href="https://qwenlm.github.io/blog/qwen2.5/"> 📑 blog</a>.
 
-For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) (to be updated for Qwen2.5).
+For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) .
 
 ## Quickstart
 
diff --git a/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po b/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po
index c89449288da31aba0be79f1940b045da9e4f39f9..e1424ea10b289b38ccccbf13bee781920d8fcfec 100644
--- a/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po
+++ b/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po
@@ -7,7 +7,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Qwen \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-09-18 21:18+0800\n"
+"POT-Creation-Date: 2024-10-31 15:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -16,219 +16,244 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.15.0\n"
+"Generated-By: Babel 2.16.0\n"
 
 #: ../../source/benchmark/speed_benchmark.rst:2
-#: 96f9c969f82049efbaf7b70525976649
-msgid "Speed Benchmark"
+#: c37062a883c842a2b89fc3971b2209cb
+msgid "Qwen2.5 Speed Benchmark"
 msgstr "效率评估"
 
 #: ../../source/benchmark/speed_benchmark.rst:5
-#: 3e97857c19314350b1d6686ad9776d35
-msgid "To be updated for Qwen2.5."
-msgstr "Qwen2.5结果待更新,由于模型结构差异有限,Qwen2结果可供参考。"
+#: 5577386104e04ce0820d75b8d4a4b9bb
+msgid "This section reports the speed performance of bf16 models, quantized models (including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2.5 series. Specifically, we report the inference speed (tokens/s) as well as memory footprint (GB) under the conditions of different context lengths."
+msgstr "本部分介绍Qwen2.5系列模型(原始模型和量化模型)的效率测试结果,包括推理速度(tokens/s)与不同上下文长度时的显存占用(GB)。"
 
-#: ../../source/benchmark/speed_benchmark.rst:7
-#: 4f0e196db456466997765e4b93b873be
-msgid "This section reports the speed performance of bf16 models, quantized models (including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2 series. Specifically, we report the inference speed (tokens/s) as well as memory footprint (GB) under the conditions of different context lengths."
-msgstr "本部分介绍Qwen2模型(原始模型和量化模型)的效率测试结果,包括推理速度(tokens/s)与不同上下文长度时的显存占用(GB)。"
-
-#: ../../source/benchmark/speed_benchmark.rst:12
-#: d3a3a79f4010466f882bd52955780253
+#: ../../source/benchmark/speed_benchmark.rst:10
+#: 9edf3184b2694e6d9dee05c519bea1ae
 msgid "The environment of the evaluation with huggingface transformers is:"
 msgstr "测试HuggingFace ``transformers`` 时的环境配置:"
 
-#: ../../source/benchmark/speed_benchmark.rst:14
-#: ../../source/benchmark/speed_benchmark.rst:24
-#: 8e1e5f8b79c54381b4bf00c8637954c8
+#: ../../source/benchmark/speed_benchmark.rst:12
+#: ../../source/benchmark/speed_benchmark.rst:23
+#: 5929629b0bf143ab983efd4e2aa964c8 b619da3afa86420ba7e2583d9a5e7c39
 msgid "NVIDIA A100 80GB"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:15
-#: ../../source/benchmark/speed_benchmark.rst:25
-#: 79bb2a6aea064df79c0819b9c966b867
-msgid "CUDA 11.8"
+#: ../../source/benchmark/speed_benchmark.rst:13
+#: ../../source/benchmark/speed_benchmark.rst:24
+#: 6986d9f22df54554a9e830b3828a5ed2 a4e87ae3bd2042429b8df23c779f6373
+msgid "CUDA 12.1"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:16
-#: 6ed8b5fb474842c18b4e319eebbbb73f
-msgid "Pytorch 2.1.2+cu118"
+#: ../../source/benchmark/speed_benchmark.rst:14
+#: 190e255dcd1e469294188508b49bf98c
+msgid "Pytorch 2.3.1"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:17
-#: e6429404fdc543e1a80c811b9ef32e2a
-msgid "Flash Attention 2.3.3"
+#: ../../source/benchmark/speed_benchmark.rst:15
+#: c693c6e715074b2daa95c62064b4e79e
+msgid "Flash Attention 2.5.8"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:18
-#: a02f0bd8337949288a07caa4704aa55a
-msgid "Transformers 4.38.2"
+#: ../../source/benchmark/speed_benchmark.rst:16
+#: ../../source/benchmark/speed_benchmark.rst:28
+#: 3796f99ed359444da30190e7a3b86428 bfc7d82414fa46a58a09553f4c703af6
+msgid "Transformers 4.46.0"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:19
-#: 7072898a11164f7ca15acca7edaca4f9
-msgid "AutoGPTQ 0.7.1"
+#: ../../source/benchmark/speed_benchmark.rst:17
+#: 427aa447657849cba460032041380f2e
+msgid "AutoGPTQ 0.7.1+cu121 (Compiled from source code)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:20
-#: 76bdca0175824567908b4cbc83c02731
-msgid "AutoAWQ 0.2.4"
+#: ../../source/benchmark/speed_benchmark.rst:18
+#: aabddb4e2b0244ea9c27788ce453f30e
+msgid "AutoAWQ 0.2.6"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:22
-#: 568b1b0c821d4af199a3d6122f38d1ea
+#: ../../source/benchmark/speed_benchmark.rst:21
+#: 8f4e975fbc9d48f18cb30d75a9f335db
 msgid "The environment of the evaluation with vLLM is:"
 msgstr "测试vLLM时的环境配置:"
 
-#: ../../source/benchmark/speed_benchmark.rst:26
-#: 73515f5745e148cc8ddf1e1ae1c9da3b
-msgid "Pytorch 2.3.0+cu118"
-msgstr ""
-
-#: ../../source/benchmark/speed_benchmark.rst:27
-#: 3a291f04fa1f4c86b646c28625f36868
-msgid "Flash Attention 2.5.6"
+#: ../../source/benchmark/speed_benchmark.rst:25
+#: 4fd3b3a5e61747f6b1577593d144efe0
+msgid "vLLM 0.6.3"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:28
-#: 0c0ef00c714a43d3a34f3084b4198415
-msgid "Transformers 4.40.1"
+#: ../../source/benchmark/speed_benchmark.rst:26
+#: a5ed9f5e4c164ddaa94443aaf9fad845
+msgid "Pytorch 2.4.0"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:29
-#: 9f06eca79d96433e8aa3bc7ddc2bd2f0
-msgid "vLLM 0.4.2"
+#: ../../source/benchmark/speed_benchmark.rst:27
+#: c10ed8e23a3f4876908373715e50d88b
+msgid "Flash Attention 2.6.3"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:31
-#: 7347c156175b4d91b0257d11781cbef3
-msgid "Note:"
+#: 408f8720e08641238a86c8976c54b69f
+msgid "Notes:"
 msgstr "注意:"
 
 #: ../../source/benchmark/speed_benchmark.rst:33
-#: aa77ff662e564d6dbf92c329438dc9c8
-msgid "We use the batch size of 1 and the least number of GPUs as possible for the evalution."
+#: 721fca542cbe44dca41c5209a83b2df7
+msgid "We use the batch size of 1 and the least number of GPUs as possible for the evaluation."
 msgstr "batch size 设置为1,使用 GPU 数量尽可能少"
 
 #: ../../source/benchmark/speed_benchmark.rst:35
-#: 012d76e949c04800b07ec12b30179c15
-msgid "We test the speed and memory of generating 2048 tokens with the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 tokens (\\>32k is only avaliable for Qwen2-72B-Instuct and Qwen2-7B-Instuct)."
+#: 974fc88f26354dce8f66862962a6a420
+msgid "We test the speed and memory of generating 2048 tokens with the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 tokens."
 msgstr "我们测试生成2048 tokens时的速度与显存占用,输入长度分别为1、6144、14336、30720、63488、129024 tokens。(超过32K长度仅有 Qwen2-72B-Instuct 与 Qwen2-7B-Instuct 支持)"
 
 #: ../../source/benchmark/speed_benchmark.rst:38
-#: 9c656be43bf3416988786e8e97236550
+#: cfe1df792a90474983a34723522f5550
 msgid "For vLLM, the memory usage is not reported because it pre-allocates all GPU memory. We use ``gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False`` by default."
 msgstr "对于vLLM,由于GPU显存预分配,实际显存使用难以评估。默认情况下,统一设定为``gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False``。"
 
-#: ../../source/benchmark/speed_benchmark.rst:43
-#: 17ae27b5cd9a48b99665eb630c771d80
+#: ../../source/benchmark/speed_benchmark.rst:44
+#: e81c536e4ad641709eb6d3af109a5464
 msgid "0.5B (Transformer)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:46
-#: ../../source/benchmark/speed_benchmark.rst:84
-#: ../../source/benchmark/speed_benchmark.rst:123
-#: ../../source/benchmark/speed_benchmark.rst:161
-#: ../../source/benchmark/speed_benchmark.rst:200
-#: ../../source/benchmark/speed_benchmark.rst:239
-#: ../../source/benchmark/speed_benchmark.rst:294
-#: ../../source/benchmark/speed_benchmark.rst:316
-#: ../../source/benchmark/speed_benchmark.rst:330
-#: ../../source/benchmark/speed_benchmark.rst:348
-#: ../../source/benchmark/speed_benchmark.rst:387
-#: b729c12bd207442c984700a48bfdb3ff
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:86
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:165
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:244
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:324
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:420
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:521
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: ../../source/benchmark/speed_benchmark.rst:624
+#: 29a713e52e01489885933e2a60b8900a b535d72a52684e25b72c546ec96397a1
 msgid "Model"
 msgstr "模型"
 
-#: ../../source/benchmark/speed_benchmark.rst:46
-#: ../../source/benchmark/speed_benchmark.rst:84
-#: ../../source/benchmark/speed_benchmark.rst:123
-#: ../../source/benchmark/speed_benchmark.rst:161
-#: ../../source/benchmark/speed_benchmark.rst:200
-#: ../../source/benchmark/speed_benchmark.rst:239
-#: ../../source/benchmark/speed_benchmark.rst:294
-#: ../../source/benchmark/speed_benchmark.rst:316
-#: ../../source/benchmark/speed_benchmark.rst:348
-#: ../../source/benchmark/speed_benchmark.rst:387
-#: a8779c4e04c74dd0a24836465b57794e
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:86
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:165
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:244
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:324
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:420
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:521
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: ../../source/benchmark/speed_benchmark.rst:624
+#: 4b84c313f4b2499eafa9ea8bd982851a c8ec43dd253e4cb9a87c537e369a6133
 msgid "Input Length"
 msgstr "输入长度"
 
-#: ../../source/benchmark/speed_benchmark.rst:46
-#: ../../source/benchmark/speed_benchmark.rst:84
-#: ../../source/benchmark/speed_benchmark.rst:123
-#: ../../source/benchmark/speed_benchmark.rst:161
-#: ../../source/benchmark/speed_benchmark.rst:200
-#: ../../source/benchmark/speed_benchmark.rst:239
-#: ../../source/benchmark/speed_benchmark.rst:294
-#: ../../source/benchmark/speed_benchmark.rst:316
-#: ../../source/benchmark/speed_benchmark.rst:330
-#: ../../source/benchmark/speed_benchmark.rst:348
-#: ../../source/benchmark/speed_benchmark.rst:387
-#: 2543cd648b094e00990b63d343b882e8
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:86
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:165
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:244
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:324
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:420
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:521
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: ../../source/benchmark/speed_benchmark.rst:624
+#: 80d85daadc3b41a091ccc3d16622d3dc 8383415081a248cfbc6468a46ec446a7
 msgid "Quantization"
 msgstr "量化"
 
-#: ../../source/benchmark/speed_benchmark.rst:46
-#: ../../source/benchmark/speed_benchmark.rst:84
-#: ../../source/benchmark/speed_benchmark.rst:123
-#: ../../source/benchmark/speed_benchmark.rst:161
-#: ../../source/benchmark/speed_benchmark.rst:200
-#: ../../source/benchmark/speed_benchmark.rst:239
-#: ../../source/benchmark/speed_benchmark.rst:294
-#: ../../source/benchmark/speed_benchmark.rst:316
-#: ../../source/benchmark/speed_benchmark.rst:348
-#: ../../source/benchmark/speed_benchmark.rst:387
-#: fe3ada130b30408e8e4736a55b0f8b9c
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:86
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:165
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:244
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:324
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:420
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:521
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: ../../source/benchmark/speed_benchmark.rst:624
+#: 72044b25841e4096a9f616a9dad358b5 83e8c98032fd445b8eec1980b8dc0967
 msgid "GPU Num"
 msgstr "GPU数量"
 
-#: ../../source/benchmark/speed_benchmark.rst:46
-#: ../../source/benchmark/speed_benchmark.rst:84
-#: ../../source/benchmark/speed_benchmark.rst:123
-#: ../../source/benchmark/speed_benchmark.rst:161
-#: ../../source/benchmark/speed_benchmark.rst:200
-#: ../../source/benchmark/speed_benchmark.rst:239
-#: ../../source/benchmark/speed_benchmark.rst:294
-#: ../../source/benchmark/speed_benchmark.rst:316
-#: ../../source/benchmark/speed_benchmark.rst:348
-#: ../../source/benchmark/speed_benchmark.rst:387
-#: 6ab3aea03cb44388b6bd7ee3a1d7684c
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:86
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:165
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:244
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:324
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:420
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:521
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: ../../source/benchmark/speed_benchmark.rst:624
+#: 009e83260f8f47c0a059732347b8fd99 b088a9cd762f4f66ac803b136592d804
 msgid "Speed(tokens/s)"
 msgstr "速度 (tokens/s)"
 
-#: ../../source/benchmark/speed_benchmark.rst:46
-#: ../../source/benchmark/speed_benchmark.rst:123
-#: ../../source/benchmark/speed_benchmark.rst:200
-#: ../../source/benchmark/speed_benchmark.rst:294
-#: ../../source/benchmark/speed_benchmark.rst:348
-#: 163d63199ed74a65b096283cf0a6b3df
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: d303cdcb58e2427e8d4302c7ff31e554 e51ecc8c63724c2f8cd13dd5e4f9c145
 msgid "GPU Memory(GB)"
 msgstr "显存占用 (GB)"
 
-#: ../../source/benchmark/speed_benchmark.rst:48
-#: ../../source/benchmark/speed_benchmark.rst:86
-#: d8e54251b281451891a5332d2d1919c2
-msgid "Qwen2-0.5B-Instruct"
-msgstr ""
-
-#: ../../source/benchmark/speed_benchmark.rst:48
-#: ../../source/benchmark/speed_benchmark.rst:50
-#: ../../source/benchmark/speed_benchmark.rst:52
-#: ../../source/benchmark/speed_benchmark.rst:54
-#: ../../source/benchmark/speed_benchmark.rst:56
-#: ../../source/benchmark/speed_benchmark.rst:58
-#: ../../source/benchmark/speed_benchmark.rst:60
-#: ../../source/benchmark/speed_benchmark.rst:62
-#: ../../source/benchmark/speed_benchmark.rst:64
-#: ../../source/benchmark/speed_benchmark.rst:66
-#: ../../source/benchmark/speed_benchmark.rst:68
-#: ../../source/benchmark/speed_benchmark.rst:70
-#: ../../source/benchmark/speed_benchmark.rst:72
-#: ../../source/benchmark/speed_benchmark.rst:74
-#: ../../source/benchmark/speed_benchmark.rst:76
-#: ../../source/benchmark/speed_benchmark.rst:78
-#: ../../source/benchmark/speed_benchmark.rst:86
+#: ../../source/benchmark/speed_benchmark.rst:47
+#: ../../source/benchmark/speed_benchmark.rst:126
+#: ../../source/benchmark/speed_benchmark.rst:205
+#: ../../source/benchmark/speed_benchmark.rst:284
+#: ../../source/benchmark/speed_benchmark.rst:324
+#: ../../source/benchmark/speed_benchmark.rst:381
+#: ../../source/benchmark/speed_benchmark.rst:420
+#: ../../source/benchmark/speed_benchmark.rst:479
+#: ../../source/benchmark/speed_benchmark.rst:521
+#: ../../source/benchmark/speed_benchmark.rst:583
+#: ../../source/benchmark/speed_benchmark.rst:624
+#: 974200ffda5f454cad058306d29c01f5
+msgid "Note"
+msgstr "注意:"
+
+#: ../../source/benchmark/speed_benchmark.rst:49
+#: ../../source/benchmark/speed_benchmark.rst:88
+#: 963269fc133c4b58a0b272708a3cd91e
+msgid "Qwen2.5-0.5B-Instruct"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:49
+#: ../../source/benchmark/speed_benchmark.rst:51
+#: ../../source/benchmark/speed_benchmark.rst:53
+#: ../../source/benchmark/speed_benchmark.rst:55
+#: ../../source/benchmark/speed_benchmark.rst:57
+#: ../../source/benchmark/speed_benchmark.rst:59
+#: ../../source/benchmark/speed_benchmark.rst:61
+#: ../../source/benchmark/speed_benchmark.rst:63
+#: ../../source/benchmark/speed_benchmark.rst:65
+#: ../../source/benchmark/speed_benchmark.rst:67
+#: ../../source/benchmark/speed_benchmark.rst:69
+#: ../../source/benchmark/speed_benchmark.rst:71
+#: ../../source/benchmark/speed_benchmark.rst:73
+#: ../../source/benchmark/speed_benchmark.rst:75
+#: ../../source/benchmark/speed_benchmark.rst:77
+#: ../../source/benchmark/speed_benchmark.rst:79
 #: ../../source/benchmark/speed_benchmark.rst:88
 #: ../../source/benchmark/speed_benchmark.rst:90
 #: ../../source/benchmark/speed_benchmark.rst:92
@@ -244,24 +269,23 @@ msgstr ""
 #: ../../source/benchmark/speed_benchmark.rst:112
 #: ../../source/benchmark/speed_benchmark.rst:114
 #: ../../source/benchmark/speed_benchmark.rst:116
-#: ../../source/benchmark/speed_benchmark.rst:125
-#: ../../source/benchmark/speed_benchmark.rst:127
-#: ../../source/benchmark/speed_benchmark.rst:129
-#: ../../source/benchmark/speed_benchmark.rst:131
-#: ../../source/benchmark/speed_benchmark.rst:133
-#: ../../source/benchmark/speed_benchmark.rst:135
-#: ../../source/benchmark/speed_benchmark.rst:137
-#: ../../source/benchmark/speed_benchmark.rst:139
-#: ../../source/benchmark/speed_benchmark.rst:141
-#: ../../source/benchmark/speed_benchmark.rst:143
-#: ../../source/benchmark/speed_benchmark.rst:145
-#: ../../source/benchmark/speed_benchmark.rst:147
-#: ../../source/benchmark/speed_benchmark.rst:149
-#: ../../source/benchmark/speed_benchmark.rst:151
-#: ../../source/benchmark/speed_benchmark.rst:153
-#: ../../source/benchmark/speed_benchmark.rst:155
-#: ../../source/benchmark/speed_benchmark.rst:163
-#: ../../source/benchmark/speed_benchmark.rst:165
+#: ../../source/benchmark/speed_benchmark.rst:118
+#: ../../source/benchmark/speed_benchmark.rst:128
+#: ../../source/benchmark/speed_benchmark.rst:130
+#: ../../source/benchmark/speed_benchmark.rst:132
+#: ../../source/benchmark/speed_benchmark.rst:134
+#: ../../source/benchmark/speed_benchmark.rst:136
+#: ../../source/benchmark/speed_benchmark.rst:138
+#: ../../source/benchmark/speed_benchmark.rst:140
+#: ../../source/benchmark/speed_benchmark.rst:142
+#: ../../source/benchmark/speed_benchmark.rst:144
+#: ../../source/benchmark/speed_benchmark.rst:146
+#: ../../source/benchmark/speed_benchmark.rst:148
+#: ../../source/benchmark/speed_benchmark.rst:150
+#: ../../source/benchmark/speed_benchmark.rst:152
+#: ../../source/benchmark/speed_benchmark.rst:154
+#: ../../source/benchmark/speed_benchmark.rst:156
+#: ../../source/benchmark/speed_benchmark.rst:158
 #: ../../source/benchmark/speed_benchmark.rst:167
 #: ../../source/benchmark/speed_benchmark.rst:169
 #: ../../source/benchmark/speed_benchmark.rst:171
@@ -276,1690 +300,2673 @@ msgstr ""
 #: ../../source/benchmark/speed_benchmark.rst:189
 #: ../../source/benchmark/speed_benchmark.rst:191
 #: ../../source/benchmark/speed_benchmark.rst:193
-#: ../../source/benchmark/speed_benchmark.rst:202
-#: ../../source/benchmark/speed_benchmark.rst:204
-#: ../../source/benchmark/speed_benchmark.rst:206
-#: ../../source/benchmark/speed_benchmark.rst:208
-#: ../../source/benchmark/speed_benchmark.rst:210
-#: ../../source/benchmark/speed_benchmark.rst:212
-#: ../../source/benchmark/speed_benchmark.rst:214
-#: ../../source/benchmark/speed_benchmark.rst:216
-#: ../../source/benchmark/speed_benchmark.rst:218
-#: ../../source/benchmark/speed_benchmark.rst:220
-#: ../../source/benchmark/speed_benchmark.rst:222
-#: ../../source/benchmark/speed_benchmark.rst:224
-#: ../../source/benchmark/speed_benchmark.rst:226
-#: ../../source/benchmark/speed_benchmark.rst:228
-#: ../../source/benchmark/speed_benchmark.rst:230
-#: ../../source/benchmark/speed_benchmark.rst:232
-#: ../../source/benchmark/speed_benchmark.rst:241
-#: ../../source/benchmark/speed_benchmark.rst:243
-#: ../../source/benchmark/speed_benchmark.rst:245
-#: ../../source/benchmark/speed_benchmark.rst:247
-#: ../../source/benchmark/speed_benchmark.rst:249
-#: ../../source/benchmark/speed_benchmark.rst:251
-#: ../../source/benchmark/speed_benchmark.rst:253
-#: ../../source/benchmark/speed_benchmark.rst:255
-#: ../../source/benchmark/speed_benchmark.rst:257
-#: ../../source/benchmark/speed_benchmark.rst:259
-#: ../../source/benchmark/speed_benchmark.rst:261
-#: ../../source/benchmark/speed_benchmark.rst:263
-#: ../../source/benchmark/speed_benchmark.rst:265
-#: ../../source/benchmark/speed_benchmark.rst:267
-#: ../../source/benchmark/speed_benchmark.rst:269
-#: ../../source/benchmark/speed_benchmark.rst:271
-#: ../../source/benchmark/speed_benchmark.rst:273
-#: ../../source/benchmark/speed_benchmark.rst:275
-#: ../../source/benchmark/speed_benchmark.rst:277
-#: ../../source/benchmark/speed_benchmark.rst:279
-#: ../../source/benchmark/speed_benchmark.rst:281
-#: ../../source/benchmark/speed_benchmark.rst:283
-#: ../../source/benchmark/speed_benchmark.rst:285
-#: ../../source/benchmark/speed_benchmark.rst:287
+#: ../../source/benchmark/speed_benchmark.rst:195
+#: ../../source/benchmark/speed_benchmark.rst:197
+#: ../../source/benchmark/speed_benchmark.rst:207
+#: ../../source/benchmark/speed_benchmark.rst:209
+#: ../../source/benchmark/speed_benchmark.rst:211
+#: ../../source/benchmark/speed_benchmark.rst:213
+#: ../../source/benchmark/speed_benchmark.rst:215
+#: ../../source/benchmark/speed_benchmark.rst:217
+#: ../../source/benchmark/speed_benchmark.rst:219
+#: ../../source/benchmark/speed_benchmark.rst:221
+#: ../../source/benchmark/speed_benchmark.rst:223
+#: ../../source/benchmark/speed_benchmark.rst:225
+#: ../../source/benchmark/speed_benchmark.rst:227
+#: ../../source/benchmark/speed_benchmark.rst:229
+#: ../../source/benchmark/speed_benchmark.rst:231
+#: ../../source/benchmark/speed_benchmark.rst:233
+#: ../../source/benchmark/speed_benchmark.rst:235
+#: ../../source/benchmark/speed_benchmark.rst:237
+#: ../../source/benchmark/speed_benchmark.rst:246
+#: ../../source/benchmark/speed_benchmark.rst:248
+#: ../../source/benchmark/speed_benchmark.rst:250
+#: ../../source/benchmark/speed_benchmark.rst:252
+#: ../../source/benchmark/speed_benchmark.rst:254
+#: ../../source/benchmark/speed_benchmark.rst:256
+#: ../../source/benchmark/speed_benchmark.rst:258
+#: ../../source/benchmark/speed_benchmark.rst:260
+#: ../../source/benchmark/speed_benchmark.rst:262
+#: ../../source/benchmark/speed_benchmark.rst:264
+#: ../../source/benchmark/speed_benchmark.rst:266
+#: ../../source/benchmark/speed_benchmark.rst:268
+#: ../../source/benchmark/speed_benchmark.rst:270
+#: ../../source/benchmark/speed_benchmark.rst:272
+#: ../../source/benchmark/speed_benchmark.rst:274
+#: ../../source/benchmark/speed_benchmark.rst:276
+#: ../../source/benchmark/speed_benchmark.rst:286
+#: ../../source/benchmark/speed_benchmark.rst:288
+#: ../../source/benchmark/speed_benchmark.rst:290
+#: ../../source/benchmark/speed_benchmark.rst:292
+#: ../../source/benchmark/speed_benchmark.rst:294
 #: ../../source/benchmark/speed_benchmark.rst:296
 #: ../../source/benchmark/speed_benchmark.rst:298
+#: ../../source/benchmark/speed_benchmark.rst:300
 #: ../../source/benchmark/speed_benchmark.rst:302
+#: ../../source/benchmark/speed_benchmark.rst:304
 #: ../../source/benchmark/speed_benchmark.rst:306
+#: ../../source/benchmark/speed_benchmark.rst:308
 #: ../../source/benchmark/speed_benchmark.rst:310
-#: ../../source/benchmark/speed_benchmark.rst:318
+#: ../../source/benchmark/speed_benchmark.rst:312
+#: ../../source/benchmark/speed_benchmark.rst:314
+#: ../../source/benchmark/speed_benchmark.rst:316
+#: ../../source/benchmark/speed_benchmark.rst:326
+#: ../../source/benchmark/speed_benchmark.rst:328
+#: ../../source/benchmark/speed_benchmark.rst:330
+#: ../../source/benchmark/speed_benchmark.rst:332
+#: ../../source/benchmark/speed_benchmark.rst:334
+#: ../../source/benchmark/speed_benchmark.rst:336
+#: ../../source/benchmark/speed_benchmark.rst:338
+#: ../../source/benchmark/speed_benchmark.rst:340
+#: ../../source/benchmark/speed_benchmark.rst:342
+#: ../../source/benchmark/speed_benchmark.rst:344
+#: ../../source/benchmark/speed_benchmark.rst:346
+#: ../../source/benchmark/speed_benchmark.rst:348
 #: ../../source/benchmark/speed_benchmark.rst:350
+#: ../../source/benchmark/speed_benchmark.rst:352
 #: ../../source/benchmark/speed_benchmark.rst:354
 #: ../../source/benchmark/speed_benchmark.rst:356
+#: ../../source/benchmark/speed_benchmark.rst:358
+#: ../../source/benchmark/speed_benchmark.rst:360
 #: ../../source/benchmark/speed_benchmark.rst:362
 #: ../../source/benchmark/speed_benchmark.rst:364
+#: ../../source/benchmark/speed_benchmark.rst:366
+#: ../../source/benchmark/speed_benchmark.rst:368
 #: ../../source/benchmark/speed_benchmark.rst:370
 #: ../../source/benchmark/speed_benchmark.rst:372
+#: ../../source/benchmark/speed_benchmark.rst:383
+#: ../../source/benchmark/speed_benchmark.rst:385
+#: ../../source/benchmark/speed_benchmark.rst:387
 #: ../../source/benchmark/speed_benchmark.rst:389
+#: ../../source/benchmark/speed_benchmark.rst:391
+#: ../../source/benchmark/speed_benchmark.rst:393
 #: ../../source/benchmark/speed_benchmark.rst:395
+#: ../../source/benchmark/speed_benchmark.rst:397
+#: ../../source/benchmark/speed_benchmark.rst:399
+#: ../../source/benchmark/speed_benchmark.rst:401
+#: ../../source/benchmark/speed_benchmark.rst:403
 #: ../../source/benchmark/speed_benchmark.rst:405
-#: 088012662cc1481aa4119d7f6e097f51
+#: ../../source/benchmark/speed_benchmark.rst:407
+#: ../../source/benchmark/speed_benchmark.rst:409
+#: ../../source/benchmark/speed_benchmark.rst:411
+#: ../../source/benchmark/speed_benchmark.rst:413
+#: ../../source/benchmark/speed_benchmark.rst:422
+#: ../../source/benchmark/speed_benchmark.rst:424
+#: ../../source/benchmark/speed_benchmark.rst:426
+#: ../../source/benchmark/speed_benchmark.rst:428
+#: ../../source/benchmark/speed_benchmark.rst:430
+#: ../../source/benchmark/speed_benchmark.rst:432
+#: ../../source/benchmark/speed_benchmark.rst:434
+#: ../../source/benchmark/speed_benchmark.rst:436
+#: ../../source/benchmark/speed_benchmark.rst:438
+#: ../../source/benchmark/speed_benchmark.rst:440
+#: ../../source/benchmark/speed_benchmark.rst:442
+#: ../../source/benchmark/speed_benchmark.rst:444
+#: ../../source/benchmark/speed_benchmark.rst:446
+#: ../../source/benchmark/speed_benchmark.rst:448
+#: ../../source/benchmark/speed_benchmark.rst:450
+#: ../../source/benchmark/speed_benchmark.rst:452
+#: ../../source/benchmark/speed_benchmark.rst:454
+#: ../../source/benchmark/speed_benchmark.rst:456
+#: ../../source/benchmark/speed_benchmark.rst:458
+#: ../../source/benchmark/speed_benchmark.rst:460
+#: ../../source/benchmark/speed_benchmark.rst:462
+#: ../../source/benchmark/speed_benchmark.rst:464
+#: ../../source/benchmark/speed_benchmark.rst:466
+#: ../../source/benchmark/speed_benchmark.rst:468
+#: ../../source/benchmark/speed_benchmark.rst:481
+#: ../../source/benchmark/speed_benchmark.rst:483
+#: ../../source/benchmark/speed_benchmark.rst:485
+#: ../../source/benchmark/speed_benchmark.rst:487
+#: ../../source/benchmark/speed_benchmark.rst:489
+#: ../../source/benchmark/speed_benchmark.rst:491
+#: ../../source/benchmark/speed_benchmark.rst:493
+#: ../../source/benchmark/speed_benchmark.rst:495
+#: ../../source/benchmark/speed_benchmark.rst:497
+#: ../../source/benchmark/speed_benchmark.rst:499
+#: ../../source/benchmark/speed_benchmark.rst:501
+#: ../../source/benchmark/speed_benchmark.rst:503
+#: ../../source/benchmark/speed_benchmark.rst:505
+#: ../../source/benchmark/speed_benchmark.rst:507
+#: ../../source/benchmark/speed_benchmark.rst:509
+#: ../../source/benchmark/speed_benchmark.rst:511
+#: ../../source/benchmark/speed_benchmark.rst:523
+#: ../../source/benchmark/speed_benchmark.rst:525
+#: ../../source/benchmark/speed_benchmark.rst:527
+#: ../../source/benchmark/speed_benchmark.rst:529
+#: ../../source/benchmark/speed_benchmark.rst:531
+#: ../../source/benchmark/speed_benchmark.rst:533
+#: ../../source/benchmark/speed_benchmark.rst:535
+#: ../../source/benchmark/speed_benchmark.rst:537
+#: ../../source/benchmark/speed_benchmark.rst:539
+#: ../../source/benchmark/speed_benchmark.rst:541
+#: ../../source/benchmark/speed_benchmark.rst:543
+#: ../../source/benchmark/speed_benchmark.rst:545
+#: ../../source/benchmark/speed_benchmark.rst:549
+#: ../../source/benchmark/speed_benchmark.rst:551
+#: ../../source/benchmark/speed_benchmark.rst:553
+#: ../../source/benchmark/speed_benchmark.rst:557
+#: ../../source/benchmark/speed_benchmark.rst:559
+#: ../../source/benchmark/speed_benchmark.rst:561
+#: ../../source/benchmark/speed_benchmark.rst:565
+#: ../../source/benchmark/speed_benchmark.rst:567
+#: ../../source/benchmark/speed_benchmark.rst:569
+#: ../../source/benchmark/speed_benchmark.rst:585
+#: ../../source/benchmark/speed_benchmark.rst:589
+#: ../../source/benchmark/speed_benchmark.rst:591
+#: ../../source/benchmark/speed_benchmark.rst:597
+#: ../../source/benchmark/speed_benchmark.rst:599
+#: ../../source/benchmark/speed_benchmark.rst:605
+#: ../../source/benchmark/speed_benchmark.rst:607
+#: ../../source/benchmark/speed_benchmark.rst:626
+#: ../../source/benchmark/speed_benchmark.rst:632
+#: ../../source/benchmark/speed_benchmark.rst:642
+#: 11b2dcd21f6a43b1bfbbd5a7aaea3ec2 19c2f2475f7646b585545f47fb8cef25
+#: 2906110349e84105944420c86c6f8e14 2b0c156abdb6483b85ce4b3fec8e55ac
+#: 2b285733cf4d4370a8e72f26624a3aaa 2c10691a557b4a39af14e0aa8e14eb37
+#: 3b25b912f2aa44bc9082fcb99e3dcc1b 3d36f48f498644f992fb5fb2d1c176ec
+#: 4e0d4370b96a4ce5be99136b48b56b6a 4e68f9a0780c4d7f85f3ef1d4c23b263
+#: 6deca5be0ae6442ebad196e2b191ef54 7e20570ccc954dc483edf6522e157565
+#: 94a0dd46312f424c8d0333576d092543 99ddc7db2e254f0390d7611f90f0739b
+#: d454f94031e040e484f9727a2881dcf7 e55ec1ef57274e9b915240ccea232523
+#: f2f53c0c8931410890c76db651671db5 f6e6bb504f0942279014ecdc0b28fbfc
+#: fd0a186660b9468db1d816af63a72437
 msgid "1"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:48
-#: ../../source/benchmark/speed_benchmark.rst:56
-#: ../../source/benchmark/speed_benchmark.rst:64
-#: ../../source/benchmark/speed_benchmark.rst:72
-#: ../../source/benchmark/speed_benchmark.rst:86
-#: ../../source/benchmark/speed_benchmark.rst:94
-#: ../../source/benchmark/speed_benchmark.rst:102
-#: ../../source/benchmark/speed_benchmark.rst:110
-#: ../../source/benchmark/speed_benchmark.rst:125
-#: ../../source/benchmark/speed_benchmark.rst:133
-#: ../../source/benchmark/speed_benchmark.rst:141
-#: ../../source/benchmark/speed_benchmark.rst:149
-#: ../../source/benchmark/speed_benchmark.rst:163
-#: ../../source/benchmark/speed_benchmark.rst:171
-#: ../../source/benchmark/speed_benchmark.rst:179
-#: ../../source/benchmark/speed_benchmark.rst:187
-#: ../../source/benchmark/speed_benchmark.rst:202
-#: ../../source/benchmark/speed_benchmark.rst:210
-#: ../../source/benchmark/speed_benchmark.rst:218
-#: ../../source/benchmark/speed_benchmark.rst:226
-#: ../../source/benchmark/speed_benchmark.rst:241
-#: ../../source/benchmark/speed_benchmark.rst:249
-#: ../../source/benchmark/speed_benchmark.rst:257
-#: ../../source/benchmark/speed_benchmark.rst:265
-#: ../../source/benchmark/speed_benchmark.rst:273
-#: ../../source/benchmark/speed_benchmark.rst:281
-#: ../../source/benchmark/speed_benchmark.rst:296
-#: ../../source/benchmark/speed_benchmark.rst:300
-#: ../../source/benchmark/speed_benchmark.rst:304
-#: ../../source/benchmark/speed_benchmark.rst:308
-#: ../../source/benchmark/speed_benchmark.rst:318
-#: ../../source/benchmark/speed_benchmark.rst:320
-#: ../../source/benchmark/speed_benchmark.rst:322
-#: ../../source/benchmark/speed_benchmark.rst:324
-#: ../../source/benchmark/speed_benchmark.rst:332
+#: ../../source/benchmark/speed_benchmark.rst:49
+#: ../../source/benchmark/speed_benchmark.rst:57
+#: ../../source/benchmark/speed_benchmark.rst:65
+#: ../../source/benchmark/speed_benchmark.rst:73
+#: ../../source/benchmark/speed_benchmark.rst:88
+#: ../../source/benchmark/speed_benchmark.rst:96
+#: ../../source/benchmark/speed_benchmark.rst:104
+#: ../../source/benchmark/speed_benchmark.rst:112
+#: ../../source/benchmark/speed_benchmark.rst:128
+#: ../../source/benchmark/speed_benchmark.rst:136
+#: ../../source/benchmark/speed_benchmark.rst:144
+#: ../../source/benchmark/speed_benchmark.rst:152
+#: ../../source/benchmark/speed_benchmark.rst:167
+#: ../../source/benchmark/speed_benchmark.rst:175
+#: ../../source/benchmark/speed_benchmark.rst:183
+#: ../../source/benchmark/speed_benchmark.rst:191
+#: ../../source/benchmark/speed_benchmark.rst:207
+#: ../../source/benchmark/speed_benchmark.rst:215
+#: ../../source/benchmark/speed_benchmark.rst:223
+#: ../../source/benchmark/speed_benchmark.rst:231
+#: ../../source/benchmark/speed_benchmark.rst:246
+#: ../../source/benchmark/speed_benchmark.rst:254
+#: ../../source/benchmark/speed_benchmark.rst:262
+#: ../../source/benchmark/speed_benchmark.rst:270
+#: ../../source/benchmark/speed_benchmark.rst:286
+#: ../../source/benchmark/speed_benchmark.rst:294
+#: ../../source/benchmark/speed_benchmark.rst:302
+#: ../../source/benchmark/speed_benchmark.rst:310
+#: ../../source/benchmark/speed_benchmark.rst:326
 #: ../../source/benchmark/speed_benchmark.rst:334
-#: ../../source/benchmark/speed_benchmark.rst:336
-#: ../../source/benchmark/speed_benchmark.rst:338
+#: ../../source/benchmark/speed_benchmark.rst:342
 #: ../../source/benchmark/speed_benchmark.rst:350
 #: ../../source/benchmark/speed_benchmark.rst:358
 #: ../../source/benchmark/speed_benchmark.rst:366
-#: ../../source/benchmark/speed_benchmark.rst:374
-#: ../../source/benchmark/speed_benchmark.rst:389
+#: ../../source/benchmark/speed_benchmark.rst:383
 #: ../../source/benchmark/speed_benchmark.rst:391
-#: ../../source/benchmark/speed_benchmark.rst:401
-#: ../../source/benchmark/speed_benchmark.rst:411
-#: ../../source/benchmark/speed_benchmark.rst:419
-#: ../../source/benchmark/speed_benchmark.rst:427
-#: ../../source/benchmark/speed_benchmark.rst:435
-#: ../../source/benchmark/speed_benchmark.rst:443
-#: 005cece541024c82832f5c8b5f1887a5
+#: ../../source/benchmark/speed_benchmark.rst:399
+#: ../../source/benchmark/speed_benchmark.rst:407
+#: ../../source/benchmark/speed_benchmark.rst:422
+#: ../../source/benchmark/speed_benchmark.rst:430
+#: ../../source/benchmark/speed_benchmark.rst:438
+#: ../../source/benchmark/speed_benchmark.rst:446
+#: ../../source/benchmark/speed_benchmark.rst:454
+#: ../../source/benchmark/speed_benchmark.rst:462
+#: ../../source/benchmark/speed_benchmark.rst:481
+#: ../../source/benchmark/speed_benchmark.rst:489
+#: ../../source/benchmark/speed_benchmark.rst:497
+#: ../../source/benchmark/speed_benchmark.rst:505
+#: ../../source/benchmark/speed_benchmark.rst:523
+#: ../../source/benchmark/speed_benchmark.rst:531
+#: ../../source/benchmark/speed_benchmark.rst:539
+#: ../../source/benchmark/speed_benchmark.rst:547
+#: ../../source/benchmark/speed_benchmark.rst:555
+#: ../../source/benchmark/speed_benchmark.rst:563
+#: ../../source/benchmark/speed_benchmark.rst:585
+#: ../../source/benchmark/speed_benchmark.rst:593
+#: ../../source/benchmark/speed_benchmark.rst:601
+#: ../../source/benchmark/speed_benchmark.rst:609
+#: ../../source/benchmark/speed_benchmark.rst:626
+#: ../../source/benchmark/speed_benchmark.rst:628
+#: ../../source/benchmark/speed_benchmark.rst:638
+#: ../../source/benchmark/speed_benchmark.rst:648
+#: ../../source/benchmark/speed_benchmark.rst:656
+#: ../../source/benchmark/speed_benchmark.rst:664
+#: ../../source/benchmark/speed_benchmark.rst:672
+#: 80de1236199d47ee97b69d1059fc5ee1
 msgid "BF16"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:48
-#: fa45d4fccc0d44c2800f972d2630a14f
-msgid "49.94"
+#: ../../source/benchmark/speed_benchmark.rst:49
+#: 0001a574c6a14ddc939dd37f515471c3
+msgid "47.40"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:48
-#: 4eb0e83451fc4f93a061d148039df0de
-msgid "1.17"
+#: ../../source/benchmark/speed_benchmark.rst:49
+#: 492b0964f7ca45a0a7a3eb5630fab76b
+msgid "0.97"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:50
-#: ../../source/benchmark/speed_benchmark.rst:58
-#: ../../source/benchmark/speed_benchmark.rst:66
-#: ../../source/benchmark/speed_benchmark.rst:74
-#: ../../source/benchmark/speed_benchmark.rst:88
-#: ../../source/benchmark/speed_benchmark.rst:96
-#: ../../source/benchmark/speed_benchmark.rst:104
-#: ../../source/benchmark/speed_benchmark.rst:112
-#: ../../source/benchmark/speed_benchmark.rst:127
-#: ../../source/benchmark/speed_benchmark.rst:135
-#: ../../source/benchmark/speed_benchmark.rst:143
-#: ../../source/benchmark/speed_benchmark.rst:151
-#: ../../source/benchmark/speed_benchmark.rst:165
-#: ../../source/benchmark/speed_benchmark.rst:173
-#: ../../source/benchmark/speed_benchmark.rst:181
-#: ../../source/benchmark/speed_benchmark.rst:189
-#: ../../source/benchmark/speed_benchmark.rst:204
-#: ../../source/benchmark/speed_benchmark.rst:212
-#: ../../source/benchmark/speed_benchmark.rst:220
-#: ../../source/benchmark/speed_benchmark.rst:228
-#: ../../source/benchmark/speed_benchmark.rst:243
-#: ../../source/benchmark/speed_benchmark.rst:251
-#: ../../source/benchmark/speed_benchmark.rst:259
-#: ../../source/benchmark/speed_benchmark.rst:267
-#: ../../source/benchmark/speed_benchmark.rst:275
-#: ../../source/benchmark/speed_benchmark.rst:283
+#: ../../source/benchmark/speed_benchmark.rst:51
+#: ../../source/benchmark/speed_benchmark.rst:59
+#: ../../source/benchmark/speed_benchmark.rst:67
+#: ../../source/benchmark/speed_benchmark.rst:75
+#: ../../source/benchmark/speed_benchmark.rst:90
+#: ../../source/benchmark/speed_benchmark.rst:98
+#: ../../source/benchmark/speed_benchmark.rst:106
+#: ../../source/benchmark/speed_benchmark.rst:114
+#: ../../source/benchmark/speed_benchmark.rst:130
+#: ../../source/benchmark/speed_benchmark.rst:138
+#: ../../source/benchmark/speed_benchmark.rst:146
+#: ../../source/benchmark/speed_benchmark.rst:154
+#: ../../source/benchmark/speed_benchmark.rst:169
+#: ../../source/benchmark/speed_benchmark.rst:177
+#: ../../source/benchmark/speed_benchmark.rst:185
+#: ../../source/benchmark/speed_benchmark.rst:193
+#: ../../source/benchmark/speed_benchmark.rst:209
+#: ../../source/benchmark/speed_benchmark.rst:217
+#: ../../source/benchmark/speed_benchmark.rst:225
+#: ../../source/benchmark/speed_benchmark.rst:233
+#: ../../source/benchmark/speed_benchmark.rst:248
+#: ../../source/benchmark/speed_benchmark.rst:256
+#: ../../source/benchmark/speed_benchmark.rst:264
+#: ../../source/benchmark/speed_benchmark.rst:272
+#: ../../source/benchmark/speed_benchmark.rst:288
+#: ../../source/benchmark/speed_benchmark.rst:296
+#: ../../source/benchmark/speed_benchmark.rst:304
+#: ../../source/benchmark/speed_benchmark.rst:312
+#: ../../source/benchmark/speed_benchmark.rst:328
+#: ../../source/benchmark/speed_benchmark.rst:336
+#: ../../source/benchmark/speed_benchmark.rst:344
 #: ../../source/benchmark/speed_benchmark.rst:352
 #: ../../source/benchmark/speed_benchmark.rst:360
 #: ../../source/benchmark/speed_benchmark.rst:368
-#: ../../source/benchmark/speed_benchmark.rst:376
+#: ../../source/benchmark/speed_benchmark.rst:385
 #: ../../source/benchmark/speed_benchmark.rst:393
-#: ../../source/benchmark/speed_benchmark.rst:403
-#: ../../source/benchmark/speed_benchmark.rst:413
-#: ../../source/benchmark/speed_benchmark.rst:421
-#: ../../source/benchmark/speed_benchmark.rst:429
-#: ../../source/benchmark/speed_benchmark.rst:437
-#: ../../source/benchmark/speed_benchmark.rst:445
-#: 360e69153b484c13a7b341af2104245e
+#: ../../source/benchmark/speed_benchmark.rst:401
+#: ../../source/benchmark/speed_benchmark.rst:409
+#: ../../source/benchmark/speed_benchmark.rst:424
+#: ../../source/benchmark/speed_benchmark.rst:432
+#: ../../source/benchmark/speed_benchmark.rst:440
+#: ../../source/benchmark/speed_benchmark.rst:448
+#: ../../source/benchmark/speed_benchmark.rst:456
+#: ../../source/benchmark/speed_benchmark.rst:464
+#: ../../source/benchmark/speed_benchmark.rst:483
+#: ../../source/benchmark/speed_benchmark.rst:491
+#: ../../source/benchmark/speed_benchmark.rst:499
+#: ../../source/benchmark/speed_benchmark.rst:507
+#: ../../source/benchmark/speed_benchmark.rst:525
+#: ../../source/benchmark/speed_benchmark.rst:533
+#: ../../source/benchmark/speed_benchmark.rst:541
+#: ../../source/benchmark/speed_benchmark.rst:549
+#: ../../source/benchmark/speed_benchmark.rst:557
+#: ../../source/benchmark/speed_benchmark.rst:565
+#: ../../source/benchmark/speed_benchmark.rst:587
+#: ../../source/benchmark/speed_benchmark.rst:595
+#: ../../source/benchmark/speed_benchmark.rst:603
+#: ../../source/benchmark/speed_benchmark.rst:611
+#: ../../source/benchmark/speed_benchmark.rst:630
+#: ../../source/benchmark/speed_benchmark.rst:640
+#: ../../source/benchmark/speed_benchmark.rst:650
+#: ../../source/benchmark/speed_benchmark.rst:658
+#: ../../source/benchmark/speed_benchmark.rst:666
+#: ../../source/benchmark/speed_benchmark.rst:674
+#: 87ed274bcbda417cb8853fb427e297be
 msgid "GPTQ-Int8"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:50
-#: 937ae93383b649c4a9676caa076e2dfd
-msgid "36.35"
+#: ../../source/benchmark/speed_benchmark.rst:51
+#: 0489903c73e345218d5addaedc224046
+msgid "35.17"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:50
-#: 54bc220ea7dd4876b029adefc06e51b8
-msgid "0.85"
+#: ../../source/benchmark/speed_benchmark.rst:51
+#: 3f7d12b997c84246b6e1c2848982f9b6
+msgid "0.64"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:52
-#: ../../source/benchmark/speed_benchmark.rst:60
-#: ../../source/benchmark/speed_benchmark.rst:68
-#: ../../source/benchmark/speed_benchmark.rst:76
-#: ../../source/benchmark/speed_benchmark.rst:90
-#: ../../source/benchmark/speed_benchmark.rst:98
-#: ../../source/benchmark/speed_benchmark.rst:106
-#: ../../source/benchmark/speed_benchmark.rst:114
-#: ../../source/benchmark/speed_benchmark.rst:129
-#: ../../source/benchmark/speed_benchmark.rst:137
-#: ../../source/benchmark/speed_benchmark.rst:145
-#: ../../source/benchmark/speed_benchmark.rst:153
-#: ../../source/benchmark/speed_benchmark.rst:167
-#: ../../source/benchmark/speed_benchmark.rst:175
-#: ../../source/benchmark/speed_benchmark.rst:183
-#: ../../source/benchmark/speed_benchmark.rst:191
-#: ../../source/benchmark/speed_benchmark.rst:206
-#: ../../source/benchmark/speed_benchmark.rst:214
-#: ../../source/benchmark/speed_benchmark.rst:222
-#: ../../source/benchmark/speed_benchmark.rst:230
-#: ../../source/benchmark/speed_benchmark.rst:245
-#: ../../source/benchmark/speed_benchmark.rst:253
-#: ../../source/benchmark/speed_benchmark.rst:261
-#: ../../source/benchmark/speed_benchmark.rst:269
-#: ../../source/benchmark/speed_benchmark.rst:277
-#: ../../source/benchmark/speed_benchmark.rst:285
+#: ../../source/benchmark/speed_benchmark.rst:51
+#: ../../source/benchmark/speed_benchmark.rst:59
+#: ../../source/benchmark/speed_benchmark.rst:67
+#: ../../source/benchmark/speed_benchmark.rst:75
+#: ../../source/benchmark/speed_benchmark.rst:130
+#: ../../source/benchmark/speed_benchmark.rst:138
+#: ../../source/benchmark/speed_benchmark.rst:146
+#: ../../source/benchmark/speed_benchmark.rst:154
+#: ../../source/benchmark/speed_benchmark.rst:209
+#: ../../source/benchmark/speed_benchmark.rst:217
+#: ../../source/benchmark/speed_benchmark.rst:225
+#: ../../source/benchmark/speed_benchmark.rst:233
+#: ../../source/benchmark/speed_benchmark.rst:288
+#: ../../source/benchmark/speed_benchmark.rst:296
+#: ../../source/benchmark/speed_benchmark.rst:304
+#: ../../source/benchmark/speed_benchmark.rst:312
+#: ../../source/benchmark/speed_benchmark.rst:385
+#: ../../source/benchmark/speed_benchmark.rst:393
+#: ../../source/benchmark/speed_benchmark.rst:401
+#: ../../source/benchmark/speed_benchmark.rst:409
+#: ../../source/benchmark/speed_benchmark.rst:483
+#: ../../source/benchmark/speed_benchmark.rst:491
+#: ../../source/benchmark/speed_benchmark.rst:499
+#: ../../source/benchmark/speed_benchmark.rst:507
+#: ../../source/benchmark/speed_benchmark.rst:587
+#: ../../source/benchmark/speed_benchmark.rst:595
+#: ../../source/benchmark/speed_benchmark.rst:603
+#: ../../source/benchmark/speed_benchmark.rst:611
+#: 84525916476247c8b05c566ab47af083
+msgid "auto_gptq==0.6.0+cu1210"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:53
+#: ../../source/benchmark/speed_benchmark.rst:61
+#: ../../source/benchmark/speed_benchmark.rst:69
+#: ../../source/benchmark/speed_benchmark.rst:77
+#: ../../source/benchmark/speed_benchmark.rst:92
+#: ../../source/benchmark/speed_benchmark.rst:100
+#: ../../source/benchmark/speed_benchmark.rst:108
+#: ../../source/benchmark/speed_benchmark.rst:116
+#: ../../source/benchmark/speed_benchmark.rst:132
+#: ../../source/benchmark/speed_benchmark.rst:140
+#: ../../source/benchmark/speed_benchmark.rst:148
+#: ../../source/benchmark/speed_benchmark.rst:156
+#: ../../source/benchmark/speed_benchmark.rst:171
+#: ../../source/benchmark/speed_benchmark.rst:179
+#: ../../source/benchmark/speed_benchmark.rst:187
+#: ../../source/benchmark/speed_benchmark.rst:195
+#: ../../source/benchmark/speed_benchmark.rst:211
+#: ../../source/benchmark/speed_benchmark.rst:219
+#: ../../source/benchmark/speed_benchmark.rst:227
+#: ../../source/benchmark/speed_benchmark.rst:235
+#: ../../source/benchmark/speed_benchmark.rst:250
+#: ../../source/benchmark/speed_benchmark.rst:258
+#: ../../source/benchmark/speed_benchmark.rst:266
+#: ../../source/benchmark/speed_benchmark.rst:274
+#: ../../source/benchmark/speed_benchmark.rst:290
 #: ../../source/benchmark/speed_benchmark.rst:298
-#: ../../source/benchmark/speed_benchmark.rst:302
 #: ../../source/benchmark/speed_benchmark.rst:306
-#: ../../source/benchmark/speed_benchmark.rst:310
+#: ../../source/benchmark/speed_benchmark.rst:314
+#: ../../source/benchmark/speed_benchmark.rst:330
+#: ../../source/benchmark/speed_benchmark.rst:338
+#: ../../source/benchmark/speed_benchmark.rst:346
 #: ../../source/benchmark/speed_benchmark.rst:354
 #: ../../source/benchmark/speed_benchmark.rst:362
 #: ../../source/benchmark/speed_benchmark.rst:370
-#: ../../source/benchmark/speed_benchmark.rst:378
+#: ../../source/benchmark/speed_benchmark.rst:387
 #: ../../source/benchmark/speed_benchmark.rst:395
-#: ../../source/benchmark/speed_benchmark.rst:397
-#: ../../source/benchmark/speed_benchmark.rst:405
-#: ../../source/benchmark/speed_benchmark.rst:407
-#: ../../source/benchmark/speed_benchmark.rst:415
-#: ../../source/benchmark/speed_benchmark.rst:423
-#: ../../source/benchmark/speed_benchmark.rst:431
-#: ../../source/benchmark/speed_benchmark.rst:439
-#: ../../source/benchmark/speed_benchmark.rst:447
-#: 16599c94f7314c0c9605b2f3a4c69d8f
+#: ../../source/benchmark/speed_benchmark.rst:403
+#: ../../source/benchmark/speed_benchmark.rst:411
+#: ../../source/benchmark/speed_benchmark.rst:426
+#: ../../source/benchmark/speed_benchmark.rst:434
+#: ../../source/benchmark/speed_benchmark.rst:442
+#: ../../source/benchmark/speed_benchmark.rst:450
+#: ../../source/benchmark/speed_benchmark.rst:458
+#: ../../source/benchmark/speed_benchmark.rst:466
+#: ../../source/benchmark/speed_benchmark.rst:485
+#: ../../source/benchmark/speed_benchmark.rst:493
+#: ../../source/benchmark/speed_benchmark.rst:501
+#: ../../source/benchmark/speed_benchmark.rst:509
+#: ../../source/benchmark/speed_benchmark.rst:527
+#: ../../source/benchmark/speed_benchmark.rst:535
+#: ../../source/benchmark/speed_benchmark.rst:543
+#: ../../source/benchmark/speed_benchmark.rst:551
+#: ../../source/benchmark/speed_benchmark.rst:559
+#: ../../source/benchmark/speed_benchmark.rst:567
+#: ../../source/benchmark/speed_benchmark.rst:589
+#: ../../source/benchmark/speed_benchmark.rst:597
+#: ../../source/benchmark/speed_benchmark.rst:605
+#: ../../source/benchmark/speed_benchmark.rst:613
+#: ../../source/benchmark/speed_benchmark.rst:632
+#: ../../source/benchmark/speed_benchmark.rst:634
+#: ../../source/benchmark/speed_benchmark.rst:642
+#: ../../source/benchmark/speed_benchmark.rst:644
+#: ../../source/benchmark/speed_benchmark.rst:652
+#: ../../source/benchmark/speed_benchmark.rst:660
+#: ../../source/benchmark/speed_benchmark.rst:668
+#: ../../source/benchmark/speed_benchmark.rst:676
+#: 012d772972364e4796bc37b10e53ec16
 msgid "GPTQ-Int4"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:52
-#: ../../source/benchmark/speed_benchmark.rst:64
-#: 3986a4bb478443d6a8b6d2316691873a
-msgid "49.56"
+#: ../../source/benchmark/speed_benchmark.rst:53
+#: 266958f9dda14a7dbec9c28d9d53b19e
+msgid "50.60"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:52
-#: ../../source/benchmark/speed_benchmark.rst:54
-#: 27651c408d4346628ed094125b3b9aa5
-msgid "0.68"
+#: ../../source/benchmark/speed_benchmark.rst:53
+#: 80bc084e1d8d44a39e70fa8f97f0d455
+msgid "0.48"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:54
-#: ../../source/benchmark/speed_benchmark.rst:62
-#: ../../source/benchmark/speed_benchmark.rst:70
-#: ../../source/benchmark/speed_benchmark.rst:78
-#: ../../source/benchmark/speed_benchmark.rst:92
-#: ../../source/benchmark/speed_benchmark.rst:100
-#: ../../source/benchmark/speed_benchmark.rst:108
-#: ../../source/benchmark/speed_benchmark.rst:116
-#: ../../source/benchmark/speed_benchmark.rst:131
-#: ../../source/benchmark/speed_benchmark.rst:139
-#: ../../source/benchmark/speed_benchmark.rst:147
-#: ../../source/benchmark/speed_benchmark.rst:155
-#: ../../source/benchmark/speed_benchmark.rst:169
-#: ../../source/benchmark/speed_benchmark.rst:177
-#: ../../source/benchmark/speed_benchmark.rst:185
-#: ../../source/benchmark/speed_benchmark.rst:193
-#: ../../source/benchmark/speed_benchmark.rst:208
-#: ../../source/benchmark/speed_benchmark.rst:216
-#: ../../source/benchmark/speed_benchmark.rst:224
-#: ../../source/benchmark/speed_benchmark.rst:232
-#: ../../source/benchmark/speed_benchmark.rst:247
-#: ../../source/benchmark/speed_benchmark.rst:255
-#: ../../source/benchmark/speed_benchmark.rst:263
-#: ../../source/benchmark/speed_benchmark.rst:271
-#: ../../source/benchmark/speed_benchmark.rst:279
-#: ../../source/benchmark/speed_benchmark.rst:287
+#: ../../source/benchmark/speed_benchmark.rst:55
+#: ../../source/benchmark/speed_benchmark.rst:63
+#: ../../source/benchmark/speed_benchmark.rst:71
+#: ../../source/benchmark/speed_benchmark.rst:79
+#: ../../source/benchmark/speed_benchmark.rst:94
+#: ../../source/benchmark/speed_benchmark.rst:102
+#: ../../source/benchmark/speed_benchmark.rst:110
+#: ../../source/benchmark/speed_benchmark.rst:118
+#: ../../source/benchmark/speed_benchmark.rst:134
+#: ../../source/benchmark/speed_benchmark.rst:142
+#: ../../source/benchmark/speed_benchmark.rst:150
+#: ../../source/benchmark/speed_benchmark.rst:158
+#: ../../source/benchmark/speed_benchmark.rst:173
+#: ../../source/benchmark/speed_benchmark.rst:181
+#: ../../source/benchmark/speed_benchmark.rst:189
+#: ../../source/benchmark/speed_benchmark.rst:197
+#: ../../source/benchmark/speed_benchmark.rst:213
+#: ../../source/benchmark/speed_benchmark.rst:221
+#: ../../source/benchmark/speed_benchmark.rst:229
+#: ../../source/benchmark/speed_benchmark.rst:237
+#: ../../source/benchmark/speed_benchmark.rst:252
+#: ../../source/benchmark/speed_benchmark.rst:260
+#: ../../source/benchmark/speed_benchmark.rst:268
+#: ../../source/benchmark/speed_benchmark.rst:276
+#: ../../source/benchmark/speed_benchmark.rst:292
+#: ../../source/benchmark/speed_benchmark.rst:300
+#: ../../source/benchmark/speed_benchmark.rst:308
+#: ../../source/benchmark/speed_benchmark.rst:316
+#: ../../source/benchmark/speed_benchmark.rst:332
+#: ../../source/benchmark/speed_benchmark.rst:340
+#: ../../source/benchmark/speed_benchmark.rst:348
 #: ../../source/benchmark/speed_benchmark.rst:356
 #: ../../source/benchmark/speed_benchmark.rst:364
 #: ../../source/benchmark/speed_benchmark.rst:372
-#: ../../source/benchmark/speed_benchmark.rst:380
-#: ../../source/benchmark/speed_benchmark.rst:399
-#: ../../source/benchmark/speed_benchmark.rst:409
-#: ../../source/benchmark/speed_benchmark.rst:417
-#: ../../source/benchmark/speed_benchmark.rst:425
-#: ../../source/benchmark/speed_benchmark.rst:433
-#: ../../source/benchmark/speed_benchmark.rst:441
-#: ../../source/benchmark/speed_benchmark.rst:449
-#: c04b5415b5a44246bbadbf885865c319
+#: ../../source/benchmark/speed_benchmark.rst:389
+#: ../../source/benchmark/speed_benchmark.rst:397
+#: ../../source/benchmark/speed_benchmark.rst:405
+#: ../../source/benchmark/speed_benchmark.rst:413
+#: ../../source/benchmark/speed_benchmark.rst:428
+#: ../../source/benchmark/speed_benchmark.rst:436
+#: ../../source/benchmark/speed_benchmark.rst:444
+#: ../../source/benchmark/speed_benchmark.rst:452
+#: ../../source/benchmark/speed_benchmark.rst:460
+#: ../../source/benchmark/speed_benchmark.rst:468
+#: ../../source/benchmark/speed_benchmark.rst:487
+#: ../../source/benchmark/speed_benchmark.rst:495
+#: ../../source/benchmark/speed_benchmark.rst:503
+#: ../../source/benchmark/speed_benchmark.rst:511
+#: ../../source/benchmark/speed_benchmark.rst:529
+#: ../../source/benchmark/speed_benchmark.rst:537
+#: ../../source/benchmark/speed_benchmark.rst:545
+#: ../../source/benchmark/speed_benchmark.rst:553
+#: ../../source/benchmark/speed_benchmark.rst:561
+#: ../../source/benchmark/speed_benchmark.rst:569
+#: ../../source/benchmark/speed_benchmark.rst:591
+#: ../../source/benchmark/speed_benchmark.rst:599
+#: ../../source/benchmark/speed_benchmark.rst:607
+#: ../../source/benchmark/speed_benchmark.rst:615
+#: ../../source/benchmark/speed_benchmark.rst:636
+#: ../../source/benchmark/speed_benchmark.rst:646
+#: ../../source/benchmark/speed_benchmark.rst:654
+#: ../../source/benchmark/speed_benchmark.rst:662
+#: ../../source/benchmark/speed_benchmark.rst:670
+#: ../../source/benchmark/speed_benchmark.rst:678
+#: b4f5d52a0a1f4d32ba489b80bc0cfcf5
 msgid "AWQ"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:54
-#: c7854ac0df17462398f1f76a46b3ceb1
-msgid "38.78"
+#: ../../source/benchmark/speed_benchmark.rst:55
+#: d7f6be27b48e4fa697ed1e1ad753b283
+msgid "37.09"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:56
-#: ../../source/benchmark/speed_benchmark.rst:94
-#: ../../source/benchmark/speed_benchmark.rst:133
-#: ../../source/benchmark/speed_benchmark.rst:171
-#: ../../source/benchmark/speed_benchmark.rst:210
-#: ../../source/benchmark/speed_benchmark.rst:249
-#: ../../source/benchmark/speed_benchmark.rst:300
-#: ../../source/benchmark/speed_benchmark.rst:320
-#: ../../source/benchmark/speed_benchmark.rst:358
-#: ../../source/benchmark/speed_benchmark.rst:401
-#: 5af45da3133a4d77889a92dd902810fc
+#: ../../source/benchmark/speed_benchmark.rst:55
+#: b5a162217757412797b13717c75134ac
+msgid "0.68"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:57
+#: ../../source/benchmark/speed_benchmark.rst:96
+#: ../../source/benchmark/speed_benchmark.rst:136
+#: ../../source/benchmark/speed_benchmark.rst:175
+#: ../../source/benchmark/speed_benchmark.rst:215
+#: ../../source/benchmark/speed_benchmark.rst:254
+#: ../../source/benchmark/speed_benchmark.rst:294
+#: ../../source/benchmark/speed_benchmark.rst:334
+#: ../../source/benchmark/speed_benchmark.rst:391
+#: ../../source/benchmark/speed_benchmark.rst:430
+#: ../../source/benchmark/speed_benchmark.rst:489
+#: ../../source/benchmark/speed_benchmark.rst:531
+#: ../../source/benchmark/speed_benchmark.rst:593
+#: ../../source/benchmark/speed_benchmark.rst:638
+#: 3f1d70cfb2044b51ab98de4f267d5531
 msgid "6144"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:56
-#: fc1783a72ac548478ee1012f058540cf
-msgid "50.83"
+#: ../../source/benchmark/speed_benchmark.rst:57
+#: f835247e9829471d96d12b03a72ef8d2
+msgid "47.45"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:56
-#: 6234760eccdf45a8b46bb9b7f9934cc7
-msgid "6.42"
+#: ../../source/benchmark/speed_benchmark.rst:57
+#: cf9b11d6eec34ef9ab7b2c4f18a20f30
+msgid "1.23"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:58
-#: 81e100aa291f4a69bdb8fbbfbd1c756c
-msgid "36.56"
+#: ../../source/benchmark/speed_benchmark.rst:59
+#: 6afd86f6e2514af08835e20cc299c4ae
+msgid "36.47"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:58
-#: bd4eaeffae8b4b95b65a1510c0ec72af
-msgid "6.09"
+#: ../../source/benchmark/speed_benchmark.rst:59
+#: 84c346f0591246dd8ad8fb5df4baab2e
+msgid "0.90"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:60
-#: 4d49282ef20744d683c57940b1ba28ae
-msgid "49.63"
+#: ../../source/benchmark/speed_benchmark.rst:61
+#: bf89dbb482e140a6a41bf7840ee48bcc
+msgid "48.89"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:60
-#: ../../source/benchmark/speed_benchmark.rst:208
-#: ../../source/benchmark/speed_benchmark.rst:360
-#: e0a48a78d7ea46b3b2cffdaba0e9d5f9
-msgid "5.93"
+#: ../../source/benchmark/speed_benchmark.rst:61
+#: d761d9a27829450f8dbfb0302994a349
+msgid "0.73"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:62
-#: 3b516aa67f8a4719bd2ea63babf07c15
-msgid "38.73"
+#: ../../source/benchmark/speed_benchmark.rst:63
+#: 376d12a215874e979289fb21a1a17eb6
+msgid "37.04"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:62
-#: 2dba1cef974e4eceb3cb96338c962a41
-msgid "5.92"
+#: ../../source/benchmark/speed_benchmark.rst:63
+#: 9988093e57b64c35a5caf93c456d28e2
+msgid "0.72"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:64
-#: ../../source/benchmark/speed_benchmark.rst:102
-#: ../../source/benchmark/speed_benchmark.rst:141
-#: ../../source/benchmark/speed_benchmark.rst:179
-#: ../../source/benchmark/speed_benchmark.rst:218
-#: ../../source/benchmark/speed_benchmark.rst:257
-#: ../../source/benchmark/speed_benchmark.rst:304
-#: ../../source/benchmark/speed_benchmark.rst:322
-#: ../../source/benchmark/speed_benchmark.rst:366
-#: ../../source/benchmark/speed_benchmark.rst:411
-#: 4d9a02ffd625473ea24f167b93b22e31
+#: ../../source/benchmark/speed_benchmark.rst:65
+#: ../../source/benchmark/speed_benchmark.rst:104
+#: ../../source/benchmark/speed_benchmark.rst:144
+#: ../../source/benchmark/speed_benchmark.rst:183
+#: ../../source/benchmark/speed_benchmark.rst:223
+#: ../../source/benchmark/speed_benchmark.rst:262
+#: ../../source/benchmark/speed_benchmark.rst:302
+#: ../../source/benchmark/speed_benchmark.rst:342
+#: ../../source/benchmark/speed_benchmark.rst:399
+#: ../../source/benchmark/speed_benchmark.rst:438
+#: ../../source/benchmark/speed_benchmark.rst:497
+#: ../../source/benchmark/speed_benchmark.rst:539
+#: ../../source/benchmark/speed_benchmark.rst:601
+#: ../../source/benchmark/speed_benchmark.rst:648
+#: f34f6a89b0c5484187c76a2c09633eaa
 msgid "14336"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:64
-#: 9469dd8432ef400ca83ec82378ed742c
-msgid "13.48"
+#: ../../source/benchmark/speed_benchmark.rst:65
+#: dcdcd2d8dd964b08858bd624e9f2689c
+msgid "47.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:66
-#: 5e888acd738b403280fcc366d84bd968
-msgid "36.23"
+#: ../../source/benchmark/speed_benchmark.rst:65
+#: 14fc365eeb3440588536d6a9d6c82388
+msgid "1.60"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:66
-#: 576263eb00ff4410838b596a671b32ca
-msgid "13.15"
+#: ../../source/benchmark/speed_benchmark.rst:67
+#: d0c28e4ea5994f849a8d4e7e55ff9cbb
+msgid "35.44"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:68
-#: b1b3af71d6014981a26d9b93dc746830
-msgid "48.68"
+#: ../../source/benchmark/speed_benchmark.rst:67
+#: 53b0a657215c4ae2880d11d73770620d
+msgid "1.26"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:68
-#: ccb41af5c955448b9e86b13cb8032915
-msgid "12.97"
+#: ../../source/benchmark/speed_benchmark.rst:69
+#: 5a24b968f695490b87c8e17d3741c575
+msgid "48.26"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:70
-#: 70ad05614e664d58acd34db476124c98
-msgid "38.94"
+#: ../../source/benchmark/speed_benchmark.rst:69
+#: ../../source/benchmark/speed_benchmark.rst:71
+#: 33c627b63c8646c380c661142aa254cf 849e830023f7430aa8017fe634e1aa42
+msgid "1.10"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:70
-#: 21f5daf199a54d5baa68038b1de87035
-msgid "12.99"
+#: ../../source/benchmark/speed_benchmark.rst:71
+#: c06e0f5f3ce64f10bee423985cdd2dab
+msgid "37.14"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:72
-#: ../../source/benchmark/speed_benchmark.rst:110
-#: ../../source/benchmark/speed_benchmark.rst:149
-#: ../../source/benchmark/speed_benchmark.rst:187
-#: ../../source/benchmark/speed_benchmark.rst:226
-#: ../../source/benchmark/speed_benchmark.rst:265
-#: ../../source/benchmark/speed_benchmark.rst:308
-#: ../../source/benchmark/speed_benchmark.rst:324
-#: ../../source/benchmark/speed_benchmark.rst:374
-#: ../../source/benchmark/speed_benchmark.rst:419
-#: ../../source/benchmark/speed_benchmark.rst:427
-#: 66434d21ca3c4500b21c610d842fea26
+#: ../../source/benchmark/speed_benchmark.rst:73
+#: ../../source/benchmark/speed_benchmark.rst:112
+#: ../../source/benchmark/speed_benchmark.rst:152
+#: ../../source/benchmark/speed_benchmark.rst:191
+#: ../../source/benchmark/speed_benchmark.rst:231
+#: ../../source/benchmark/speed_benchmark.rst:270
+#: ../../source/benchmark/speed_benchmark.rst:310
+#: ../../source/benchmark/speed_benchmark.rst:350
+#: ../../source/benchmark/speed_benchmark.rst:407
+#: ../../source/benchmark/speed_benchmark.rst:446
+#: ../../source/benchmark/speed_benchmark.rst:505
+#: ../../source/benchmark/speed_benchmark.rst:547
+#: ../../source/benchmark/speed_benchmark.rst:609
+#: ../../source/benchmark/speed_benchmark.rst:656
+#: 0a176a7e0dcf4ff7ae701bc5e7d587bb
 msgid "30720"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:72
-#: 6caac0e92a0c4878ab182276f59b1a5a
-msgid "49.25"
+#: ../../source/benchmark/speed_benchmark.rst:73
+#: b7bfccab32e24ce28ea83aae735d1550
+msgid "47.16"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:72
-#: ../../source/benchmark/speed_benchmark.rst:224
-#: bd358e18689d4d7e958a7c40b95f54c9
-msgid "27.61"
+#: ../../source/benchmark/speed_benchmark.rst:73
+#: 33100594b9124b019077a76333601aef
+msgid "2.34"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:74
-#: b6534e4b6d2e49a9b52f49136ff4160b
-msgid "34.61"
+#: ../../source/benchmark/speed_benchmark.rst:75
+#: ef717bccfd9c4033ad6dcd0709881410
+msgid "36.25"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:74
-#: 95f412d519524eeeaed30c404cfbe853
-msgid "27.28"
+#: ../../source/benchmark/speed_benchmark.rst:75
+#: 880ed5446cf64429b0fa0e6c1fd3a898
+msgid "2.01"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:76
-#: cbf81c32f81e4727aa522a1776d3e5d4
-msgid "48.18"
+#: ../../source/benchmark/speed_benchmark.rst:77
+#: e4d1aec476bd48838c00f3dae53a2969
+msgid "49.22"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:76
-#: cb08e94a34884e0189889838ec3ae24d
-msgid "27.12"
+#: ../../source/benchmark/speed_benchmark.rst:77
+#: 5ad95205e60b4c8d82c2f475db16864f
+msgid "1.85"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:78
-#: 3885fdf7986847ee9e2e2a515b35708e
-msgid "38.19"
+#: ../../source/benchmark/speed_benchmark.rst:79
+#: db0ad21e1eeb4b4da42aba2fa67bd3ab
+msgid "36.90"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:78
-#: 00160d4e4a8a4b7281d69a5561c93465
-msgid "27.11"
+#: ../../source/benchmark/speed_benchmark.rst:79
+#: 1abffedc29e14737ba136d3afd32ecdb
+msgid "1.84"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:81
-#: fe2476be1e1e4fe78b61ef1568f74a09
+#: ../../source/benchmark/speed_benchmark.rst:83
+#: 3e26d2d99f7c4ae1b78efea53a5b0b86
 msgid "0.5B (vLLM)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:86
-#: dd484cd49ebf4be8b751b7fd8c4d6251
-msgid "270.49"
-msgstr ""
-
 #: ../../source/benchmark/speed_benchmark.rst:88
-#: 3ff18bb246f6426eae917293e1a3023f
-msgid "235.95"
+#: 9cffc232095b46f98fa934edfa9fc82a
+msgid "311.55"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:90
-#: 420043ef7154432c9caeba778897860b
-msgid "240.07"
+#: a1621494695b40539d326f69051d3bd1
+msgid "257.07"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:92
-#: ba025c19312c461d974b3d1e3620a5b0
-msgid "233.31"
+#: 902218e94a0b45a2b0814abf77e1733c
+msgid "260.93"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:94
-#: 513ca55c943349dbbfc57cf8454a368f
-msgid "256.16"
+#: 83eeb0af79b840c98e3f8617d3cb9df6
+msgid "261.95"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:96
-#: 862f3bc6ff0b4eb0a8f75d47edbb7175
-msgid "224.30"
+#: ba89d005351d47a08775ca517bdc2eb2
+msgid "304.79"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:98
-#: de1a2b035d6f4884b92a5ea391615fd4
-msgid "226.41"
+#: e294d594f7e143fba53e758c444efc95
+msgid "254.10"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:100
-#: 01684d53f5b14502ba31d26c8af11670
-msgid "222.83"
+#: 9fcfac1eb759416b978d72aa46549feb
+msgid "257.33"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:102
-#: 6996912f3e8a41d1a7032f04193d96fa
-msgid "108.89"
+#: e23ee2c559ba42ef854234fa06bb96ab
+msgid "259.80"
 msgstr ""
 
 #: ../../source/benchmark/speed_benchmark.rst:104
-#: 42fffbad74c74a498f2dc6707bec3e5a
-msgid "108.10"
+#: 00fe2e6cb3aa41819bb24dc8b448c890
+msgid "290.28"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:106
+#: a44f8bc6612f4ec2a2289f8772e41203
+msgid "243.69"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:108
+#: fd80649ef116497ba426edc5f45de453
+msgid "247.01"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:110
+#: 17522829c59e4a50a0a74eb2cde70310
+msgid "249.58"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:112
+#: ae18115b27d14aa98cd4607d76cdb707
+msgid "264.51"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:114
+#: 3eb46979e4b14a05adb3a51ec0396519
+msgid "223.86"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:116
+#: 92daab465ec24e04a71c26e03bddbf32
+msgid "226.50"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:118
+#: 04ae6d05605d4207874e855e1f63382d
+msgid "229.84"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:123
+#: 7d580f9796794f38a18900d9288d26be
+msgid "1.5B (Transformer)"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:128
+#: ../../source/benchmark/speed_benchmark.rst:167
+#: 8cafaa9450e3437abf042c21ba55a3d9
+msgid "Qwen2.5-1.5B-Instruct"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:128
+#: 594836ead3fa4979bf30fe8e593ab64a
+msgid "39.68"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:128
+#: 9bbe1bff4a2d4c508143cd3d7844e3ac
+msgid "2.95"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:130
+#: 286c67211dcd49a19496074e2517771b
+msgid "32.62"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:130
+#: 2a576f22580c426181320c74dc87c8cc
+msgid "1.82"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:132
+#: 8e2f4275615347c69930dbec4579b040
+msgid "43.33"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:132
+#: 49a7a00dc96d417b9185ac07be71bbcf
+msgid "1.18"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:134
+#: a91340cc3ff944fe81c3240c360d9176
+msgid "31.70"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:134
+#: 5630e80f72324811b0bf66d246d3e6ed
+msgid "1.51"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:136
+#: e56a73c68f8d4d79ba4e6c1a59a968ab
+msgid "40.88"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:136
+#: 28be0dafd3bd444daffd1857c35ec690
+msgid "3.43"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:138
+#: 315878f7c9954721ac17e8cf6cb491d7
+msgid "31.46"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:138
+#: 53963753bb0342c2af390f309a648c0c
+msgid "2.30"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:140
+#: 9a5379cd9b98494abfdb28c3c2444f87
+msgid "43.96"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:140
+#: 3625b1fbf7734df987a671113eba55fa
+msgid "1.66"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:142
+#: 1b96812e82e4488ca3884db5d4320bd1
+msgid "32.30"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:142
+#: 0e46565da0f64915aeb3481c4af21c83
+msgid "1.63"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:144
+#: 15b3d4f76a494739b1192616cf252f7b
+msgid "40.43"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:144
+#: 0c91eb650bfe40d99fe2b08c4afa7ea5
+msgid "4.16"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:146
+#: 4453e171ef40449688394f7caa821091
+msgid "31.06"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:146
+#: b82947c25bb24e63aee31d23420709d8
+msgid "3.03"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:148
+#: 56fa95a3619d483ba9cb5633c45bfe8d
+msgid "43.66"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:148
+#: 839a3b3ed2944d08a372913abc86152f
+msgid "2.39"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:150
+#: 6569123f209348398457ccb60a143d64
+msgid "32.39"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:150
+#: c045b35c103d40628a2130d55c0e1c93
+msgid "2.36"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:152
+#: 9e8b4dd9bbfb443ab1f61a17884fe032
+msgid "38.59"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:152
+#: f9c0a1da582040faba553fc47c22a3e4
+msgid "5.62"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:154
+#: 2a82ecd7acd04cd3844821a2a2c2c3cc
+msgid "31.04"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:154
+#: b81fb53168a74923a67e78eb8d6f55d9
+msgid "4.49"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:156
+#: 98cc75e2467d43bebb467c9d11855dec
+msgid "35.68"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:156
+#: 588d59302d0e4f09b2d2b57d7301bcba
+msgid "3.85"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:158
+#: ../../source/benchmark/speed_benchmark.rst:399
+#: 844f9407c689443fa1ae2d3e695b24cb
+msgid "31.95"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:158
+#: 023eb06cdee34b149643c00988002c3a
+msgid "3.82"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:162
+#: 6b3466db2b2b47f180d5efbbfda24c6a
+msgid "1.5B (vLLM)"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:167
+#: ad04f4ef634241d79c30196ef21b679e
+msgid "183.33"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:169
+#: 7f37cd63d1264cf6963410ab43648149
+msgid "201.67"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:171
+#: bbf2a54df99e4f1db895f83afa38dcc0
+msgid "217.03"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:173
+#: e07dc1d2fd724561ac8e3d2e6ac0a921
+msgid "213.74"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:175
+#: 813967b69dd84d38905372f6345c77e8
+msgid "176.68"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:177
+#: 0ce7d395f3c8430d95b6f46cbaa8d24a
+msgid "192.83"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:179
+#: b68465b2b131495eafa2c88c9108e53b
+msgid "206.63"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:181
+#: 6b2ed1180b2d48bb946e35745e163293
+msgid "203.64"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:183
+#: 1bc126490d144597b5482fa7eb95d7ee
+msgid "168.69"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:185
+#: 3008d00f21694735b61762245e677468
+msgid "183.69"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:187
+#: 9b5e5a5e53064237a599ec28b6edad57
+msgid "195.88"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:189
+#: ec6660511fd3433cae65ddf72115ffab
+msgid "192.64"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:191
+#: a92ee1c8880b4e8f867e167d7f364515
+msgid "152.04"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:193
+#: 486afa67eadd44c98ef4cf5a0c1109e4
+msgid "162.82"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:195
+#: 773b107c088e4531a8e85a01dca14097
+msgid "173.57"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:197
+#: 9f9c766b50784524a917b446c4efb36d
+msgid "170.20"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:202
+#: 483ecbced64a418484e87cf91bddc864
+msgid "3B (Transformer)"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:207
+#: ../../source/benchmark/speed_benchmark.rst:246
+#: cf2efd68fb8242579c64d663fccb8609
+msgid "Qwen2.5-3B-Instruct"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:207
+#: b2afee60bce043839b49365e21d428a8
+msgid "30.80"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:207
+#: 9a9021e7246440f0aa79a416167fdbad
+msgid "5.95"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:209
+#: f05d78f862224f4fa6a175a976db215a
+msgid "25.69"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:209
+#: 272e54f60c964f4bb5e8f19b05161a24
+msgid "3.38"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:211
+#: 416ea3f157ae482d83cbaf59b4c335b9
+msgid "35.21"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:211
+#: 6a14e7e3b3e349f3befa19e9e4226776
+msgid "2.06"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:213
+#: c2480623ad7b494093a85be577e2d027
+msgid "25.29"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:213
+#: 12256870989e4f95814f2f69765c1fbf
+msgid "2.50"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:215
+#: f9ff4018035f455c98fd2bfdc53d6590
+msgid "32.20"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:215
+#: 94f771c78314422aa83db128bb3baf03
+msgid "6.59"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:217
+#: aa71a5f4a5074e71b22b93ab64f39c69
+msgid "24.69"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:217
+#: 36ce5d858c5543af83b20ff393768594
+msgid "3.98"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:219
+#: bc6f69e4c1a947b081abfd72264d20fa
+msgid "34.47"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:219
+#: 95f555fad98a40a88ab75f0201370c13
+msgid "2.67"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:221
+#: dbfe10e718ef42b0bef355c3e80b66c1
+msgid "24.86"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:221
+#: d508010365f14f5fb36b7ea77b71752b
+msgid "2.62"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:223
+#: 79f8bc0282264082bf3c93eed3f04590
+msgid "31.72"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:223
+#: ee392132bcbf440fabf43db322c562df
+msgid "7.47"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:225
+#: 50925b544a2e4bc3aa0ccc257024ca4b
+msgid "24.70"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:225
+#: 9dba8e275b8c4f70b98f1f5970af8730
+msgid "4.89"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:227
+#: 71744764a5a449b7aac946d3153c56ab
+msgid "34.36"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:227
+#: af83bc61338e4e929a3f87afb1239db8
+msgid "3.58"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:229
+#: 70320737dc1d45ed889623d7ef2f8279
+msgid "25.19"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:229
+#: 17ccac6b5f504b069184fe9970d5edb2
+msgid "3.54"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:231
+#: 352f513b44bf447499e8667728f20c97
+msgid "25.37"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:231
+#: 28647cb7d4014e1a9f06ce0850449995
+msgid "9.30"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:233
+#: e7527af698f2446e83f80f26bfb8df24
+msgid "21.67"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:233
+#: 50664609916a4c4593d6150273c9c2b3
+msgid "6.72"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:235
+#: f2202ff4a16240dfadd8a336ce3e271a
+msgid "23.60"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:235
+#: 2da70f70e30b47df9635c2149ec819e8
+msgid "5.41"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:237
+#: 50052bf46d534c2f82cae42ce5b735cb
+msgid "24.56"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:237
+#: 041efa2948af4f5f9054b2ed838a6c9f
+msgid "5.37"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:241
+#: e561883e92854d4ebd5ca5b91ebada0c
+msgid "3B (vLLM)"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:246
+#: a88ee9d985b04ebdb0dd7d456084e4a4
+msgid "127.61"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:248
+#: 46442bb290974c1aafcf43c651a9616a
+msgid "150.02"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:250
+#: 12a16651244941e680062a2e2e1e86de
+msgid "168.20"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:252
+#: d617e2f92f0846a6a625b8c8992d3388
+msgid "165.50"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:254
+#: 12e6eef1b9d047a9a790905be0915b7b
+msgid "123.15"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:256
+#: 0e86425155a047b1ab633a8cfb284af2
+msgid "143.09"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:258
+#: 2b5000ec29c947ccb6de0cf86cc658f4
+msgid "159.85"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:260
+#: 901ff6020c7b420b869092e8a318ec82
+msgid "156.38"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:262
+#: 1f88e99d71b54a4b9cfd229d0c593ec7
+msgid "117.35"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:264
+#: 4f4aa3d8b9c04eab8d76f25617b42f50
+msgid "135.50"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:266
+#: 1121632487ac41a2a93f5246c286ae87
+msgid "149.35"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:268
+#: 7a63565c74d34e0e93ae496a8bf41d9c
+msgid "147.75"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:270
+#: 0a2b7b0877704fbdaa1565f6cc874c9e
+msgid "105.88"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:272
+#: efe8447fabd44706abd8aeffa943d78d
+msgid "118.38"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:274
+#: 408f7a3c81cc4285918e06b326250ae5
+msgid "129.28"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:276
+#: ea2479261ef04b93befc2819c3d7ef71
+msgid "127.19"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:281
+#: 749d8bad50d54d558a2a7c8109be1f15
+msgid "7B (Transformer)"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:286
+#: ../../source/benchmark/speed_benchmark.rst:326
+#: fcb4a3afd4e047bca683738287ef07a3
+msgid "Qwen2.5-7B-Instruct"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:286
+#: 3c2c924335a441b4b3af27d136fbc23f
+msgid "40.38"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:286
+#: 27c0b52647f94f8d9d97f24af273d88e
+msgid "14.38"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:288
+#: bfb1234b9b474efd8adb54779a39cb0d
+msgid "31.55"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:288
+#: 6153946a227a453da7955093584fa625
+msgid "8.42"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:290
+#: 08f508125e15429aa02beb09ee6bd4e2
+msgid "43.10"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:290
+#: b934cda8a5c044d798046460029750d3
+msgid "5.52"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:292
+#: 0570ef47e5624272917ce2938d90ccb0
+msgid "32.03"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:292
+#: 955fa198c3994dff9814022acdc7a7a2
+msgid "5.39"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:294
+#: 2870ce90ae4c44e8a02f58fc5cf34614
+msgid "38.76"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:294
+#: 7ec940d461ef46f88084e313136078d4
+msgid "15.38"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:296
+#: 09967cea16854a2bb9de46c214b24651
+msgid "31.26"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:296
+#: 04d7d18c4b6744e1af9f5dce4723e274
+msgid "9.43"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:298
+#: 5e18fdfb38d7477daba502e09cb5a01b
+msgid "38.27"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:298
+#: aac8b1fb7c8d43a8a40abe03c97ec5e3
+msgid "6.52"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:300
+#: db61ac295ebe4d3a96dc0de0500f9479
+msgid "32.37"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:300
+#: ../../source/benchmark/speed_benchmark.rst:593
+#: ../../source/benchmark/speed_benchmark.rst:595
+#: 5c81460630084159bd47be4557a3ce9f
+msgid "6.39"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:302
+#: 4c651b836f86491f87eb22f2fad0c5a1
+msgid "29.78"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:302
+#: ff906cf57c0b49c092513cdc11b47c13
+msgid "16.91"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:304
+#: 72abf1bb71494eb6865366749931a5b7
+msgid "26.86"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:304
+#: b1aae2fc1da948ca866a025f39608846
+msgid "10.96"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:306
+#: f3993efb0e964ccdabf24e8493c044a4
+msgid "28.70"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:306
+#: d32034adfd824cb696afdef90fa076d4
+msgid "8.05"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:308
+#: 03ae9c2b037a477c84eeef5968c04b1e
+msgid "30.23"
+msgstr ""
+
+#: ../../source/benchmark/speed_benchmark.rst:308
+#: 0ec8f6316066482c9f6fa5e20a2c880b
+msgid "7.92"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:106
-#: 9bc4d5a9413640bfaf2e8cdaf3d9de23
-msgid "106.51"
+#: ../../source/benchmark/speed_benchmark.rst:310
+#: 6be1e0b48817441195446f112775565f
+msgid "18.83"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:108
-#: 1f0f4505f7284df78a4322a61914f601
-msgid "104.16"
+#: ../../source/benchmark/speed_benchmark.rst:310
+#: 58d02bb2fa094a87b3bd01cdb52f64be
+msgid "19.97"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:110
-#: 461402e6ab4f470c9982574e2fa8bc68
-msgid "97.20"
+#: ../../source/benchmark/speed_benchmark.rst:312
+#: 19ad93354d704591bb1d3e79879f216f
+msgid "17.59"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:112
-#: bdaa7f6e026f47a4b2cacb232ad06387
-msgid "94.49"
+#: ../../source/benchmark/speed_benchmark.rst:312
+#: 78c7430820c54495ae2ecfa313196522
+msgid "14.01"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:114
-#: 293677d7f8b9447e918689bb816f3579
-msgid "93.94"
+#: ../../source/benchmark/speed_benchmark.rst:314
+#: d96d0f8ac0d94a3487988708aa0b6fe3
+msgid "18.45"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:116
-#: 3f723444a6184d3cb313ae0c6a7787dd
-msgid "92.23"
+#: ../../source/benchmark/speed_benchmark.rst:314
+#: 002dc187a0e64f38ac9d3bee7808afee
+msgid "11.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:120
-#: 3bc05396581d451091e1ab8447ac93e6
-msgid "1.5B (Transformer)"
+#: ../../source/benchmark/speed_benchmark.rst:316
+#: a810b1a65945460ba05e24885b7f3dc0
+msgid "19.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:125
-#: ../../source/benchmark/speed_benchmark.rst:163
-#: c075f7be1bd3453582f4853d19cf6c5a
-msgid "Qwen2-1.5B-Instruct"
+#: ../../source/benchmark/speed_benchmark.rst:316
+#: c8d7a7404bcc4fae85a53665dc095a49
+msgid "10.98"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:125
-#: 940f647973f941f18361f2102d70056b
-msgid "40.89"
+#: ../../source/benchmark/speed_benchmark.rst:321
+#: 133954fb306d48a9b11acf7b091c51a2
+msgid "7B (vLLM)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:125
-#: 33a93e83ba564a9e917b2d6a754439fa
-msgid "3.44"
+#: ../../source/benchmark/speed_benchmark.rst:326
+#: 3b16a6d4302d4f9fac1c7efbca5b7252
+msgid "84.28"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:127
-#: 84d70d96dde541f5971359449048324d
-msgid "31.51"
+#: ../../source/benchmark/speed_benchmark.rst:328
+#: 54b65a2e12f64715a9849f4731422531
+msgid "122.01"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:127
-#: c30109a7082543f9bc6a5fe55a8b6a58
-msgid "2.31"
+#: ../../source/benchmark/speed_benchmark.rst:330
+#: 4df649f9e5f14da1a13f78d05db12740
+msgid "154.05"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:129
-#: 6f68abe9e42c47639eeaeb4f5609f7b4
-msgid "42.47"
+#: ../../source/benchmark/speed_benchmark.rst:332
+#: 5bee2fef808c4a94babaa97b2682ae2f
+msgid "148.10"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:129
-#: 73734b29fe234dbfbafba34f242c1e4a
-msgid "1.67"
+#: ../../source/benchmark/speed_benchmark.rst:334
+#: b5c20c1bf3dc405692a94240e8eb290e
+msgid "80.70"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:131
-#: c1fa797eb41a422f9361864ff712afa2
-msgid "33.62"
+#: ../../source/benchmark/speed_benchmark.rst:336
+#: 94fecae3503c4b839016be3d81c971f6
+msgid "112.38"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:131
-#: e700d18b36e64ad0a232c0214294920b
-msgid "1.64"
+#: ../../source/benchmark/speed_benchmark.rst:338
+#: ec9073a5cad8441ead072bf56f5f637c
+msgid "141.98"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:133
-#: 688d1121e73a4812a4df971e66eb67c1
-msgid "40.86"
+#: ../../source/benchmark/speed_benchmark.rst:340
+#: 6483afea40794a6aa6ebfb50edfc271d
+msgid "137.64"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:133
-#: a3b341cd57c64071af1e1177c9eef514
-msgid "8.74"
+#: ../../source/benchmark/speed_benchmark.rst:342
+#: 45c80e2974ec41e99bfefac1db1b3464
+msgid "77.69"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:135
-#: ea1ea6dae9f04dec9ec1577bd1fb199e
-msgid "31.31"
+#: ../../source/benchmark/speed_benchmark.rst:344
+#: 83fb7e8619c14079b310ca66b65aa067
+msgid "105.25"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:135
-#: caadde1aa8004027a18d1c239f6e5e14
-msgid "7.59"
+#: ../../source/benchmark/speed_benchmark.rst:346
+#: af853ac2bfd54a49a45aa4369a5a0fba
+msgid "129.35"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:137
-#: 765e83dfbd2f40e7b77ca5a50feb121a
-msgid "42.78"
+#: ../../source/benchmark/speed_benchmark.rst:348
+#: 4172734562c4474aaeb2498d636822f1
+msgid "124.91"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:137
-#: 93a119e08db942b29b2bcd62b9add69a
-msgid "6.95"
+#: ../../source/benchmark/speed_benchmark.rst:350
+#: 40320ba4e1194eb99d2473e4491bb7f8
+msgid "70.33"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:139
-#: c7948bf6de3d40c3b36c011960dfcce0
-msgid "32.90"
+#: ../../source/benchmark/speed_benchmark.rst:352
+#: 097b95f7850c410295defa167ccbe7b3
+msgid "90.71"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:139
-#: 42f7420278f0473390c49416b4f41d34
-msgid "6.92"
+#: ../../source/benchmark/speed_benchmark.rst:354
+#: b97dce41bc2d4921a4fc00799bbc5dcd
+msgid "108.30"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:141
-#: 5b3fe348759643dc9c660c83da77b653
-msgid "40.08"
+#: ../../source/benchmark/speed_benchmark.rst:356
+#: 3c3cae4399744a6190c88476f3dd79d2
+msgid "104.66"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:141
-#: 773ea8ca41ba4c3e8e8e958b597c0fcb
-msgid "15.92"
+#: ../../source/benchmark/speed_benchmark.rst:358
+#: ../../source/benchmark/speed_benchmark.rst:454
+#: ../../source/benchmark/speed_benchmark.rst:555
+#: ../../source/benchmark/speed_benchmark.rst:664
+#: 9259c8941b6b4551b9f0bc24f9ef2523
+msgid "63488"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:143
-#: e1bb60b24d8b442f99ee0b8682bde89c
-msgid "31.19"
+#: ../../source/benchmark/speed_benchmark.rst:358
+#: f423ec2646fa49b18d7ebaab159c7a21
+msgid "50.86"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:143
-#: 018567fe9dca42dfb12653d1977ccfe8
-msgid "14.79"
+#: ../../source/benchmark/speed_benchmark.rst:358
+#: ../../source/benchmark/speed_benchmark.rst:360
+#: ../../source/benchmark/speed_benchmark.rst:362
+#: ../../source/benchmark/speed_benchmark.rst:364
+#: ../../source/benchmark/speed_benchmark.rst:454
+#: ../../source/benchmark/speed_benchmark.rst:456
+#: ../../source/benchmark/speed_benchmark.rst:458
+#: ../../source/benchmark/speed_benchmark.rst:460
+#: ../../source/benchmark/speed_benchmark.rst:555
+#: ../../source/benchmark/speed_benchmark.rst:557
+#: ../../source/benchmark/speed_benchmark.rst:559
+#: ../../source/benchmark/speed_benchmark.rst:561
+#: 9f914a0377a345d49e368e87916880dc
+msgid "setting-64k"
+msgstr "[设定3]"
+
+#: ../../source/benchmark/speed_benchmark.rst:360
+#: ae5ea50e06bc4a2aae58f40493a9f05b
+msgid "60.52"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:145
-#: c69e4b4c0c604dc48eb84e00cfde594f
-msgid "42.25"
+#: ../../source/benchmark/speed_benchmark.rst:362
+#: dfc99e88603e4ecea4402dab9745c3b2
+msgid "67.97"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:145
-#: 20b961001b59497f8ab30c4085507df3
-msgid "14.14"
+#: ../../source/benchmark/speed_benchmark.rst:364
+#: f522310880a04e2a81a3b3ac5cedb576
+msgid "66.42"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:147
-#: 8c68b14d52844f2d886c35a686834974
-msgid "33.24"
+#: ../../source/benchmark/speed_benchmark.rst:366
+#: ../../source/benchmark/speed_benchmark.rst:462
+#: ../../source/benchmark/speed_benchmark.rst:563
+#: ../../source/benchmark/speed_benchmark.rst:672
+#: faefdd63feaa4a01b2752e5598b532f4
+msgid "129024"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:147
-#: 0212afe446b145549d91c105e5e76614
-msgid "14.12"
+#: ../../source/benchmark/speed_benchmark.rst:366
+#: c42f54029293491b99c956c75c70b668
+msgid "28.94"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:149
-#: 265ca5aa4b75412cb97db32ede5c0b80
-msgid "34.09"
+#: ../../source/benchmark/speed_benchmark.rst:366
+#: ../../source/benchmark/speed_benchmark.rst:368
+#: ../../source/benchmark/speed_benchmark.rst:370
+#: ../../source/benchmark/speed_benchmark.rst:372
+#: ../../source/benchmark/speed_benchmark.rst:462
+#: ../../source/benchmark/speed_benchmark.rst:464
+#: ../../source/benchmark/speed_benchmark.rst:466
+#: ../../source/benchmark/speed_benchmark.rst:468
+#: ../../source/benchmark/speed_benchmark.rst:563
+#: ../../source/benchmark/speed_benchmark.rst:565
+#: ../../source/benchmark/speed_benchmark.rst:567
+#: ../../source/benchmark/speed_benchmark.rst:569
+#: 85d257e52ab94f73a93fa58256c33254
+msgid "vllm==0.6.2, new sample config"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:149
-#: 624e92cc77e74e74aa6a9b39bfecb4fd
-msgid "30.31"
+#: ../../source/benchmark/speed_benchmark.rst:368
+#: f3c0b8aad8b447cf9094e0f935dd6154
+msgid "25.97"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:151
-#: 51548441e0e344cfb930a8834eb4487f
-msgid "28.52"
+#: ../../source/benchmark/speed_benchmark.rst:370
+#: 853ab75633f34606b5c88c7c2fde8b9e
+msgid "26.37"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:151
-#: 6ae32ef178ec468885ed6481e3309479
-msgid "29.18"
+#: ../../source/benchmark/speed_benchmark.rst:372
+#: 0455b0ef546046b8aa90f9fea2c9c227
+msgid "26.57"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:153
-#: 9de15749ba814fe5a9cd3eed27b72791
-msgid "31.30"
+#: ../../source/benchmark/speed_benchmark.rst:375
+#: ../../source/benchmark/speed_benchmark.rst:471
+#: ../../source/benchmark/speed_benchmark.rst:575
+#: 72d24e828e8943af941fe8a9cebd7d0b
+msgid "[Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)"
+msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
+
+#: ../../source/benchmark/speed_benchmark.rst:376
+#: ../../source/benchmark/speed_benchmark.rst:472
+#: ../../source/benchmark/speed_benchmark.rst:576
+#: 11198a89e8b7487e828a4f6a09b88099 120e6bede1b546419976fec97f3a7ca2
+#: 72ed1e8d670944b497fd3cb233e311c9
+msgid "[new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:153
-#: 770e0c3fc018489eae33ae577cca5cec
-msgid "28.54"
+#: ../../source/benchmark/speed_benchmark.rst:378
+#: 68fe7a0cc323439ca01ff176b7ec3f75
+msgid "14B (Transformer)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:155
-#: f7130dfa6c33425d86256e256d858806
-msgid "32.16"
+#: ../../source/benchmark/speed_benchmark.rst:383
+#: ../../source/benchmark/speed_benchmark.rst:422
+#: 0efef6844eb24c1ca0e88aa055bb539d
+msgid "Qwen2.5-14B-Instruct"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:155
-#: 9147b886700248fb8c2727be584ae56b
-msgid "28.51"
+#: ../../source/benchmark/speed_benchmark.rst:383
+#: 346d77e66b5643378af83efb842ae18c
+msgid "24.74"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:158
-#: d92811c61f17424b8776a4e80e11ca58
-msgid "1.5B (vLLM)"
+#: ../../source/benchmark/speed_benchmark.rst:383
+#: c197b99d194340f6aa9b71370bc60405
+msgid "28.08"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:163
-#: 54ce51f375824d0094cc2e0926faab25
-msgid "175.55"
+#: ../../source/benchmark/speed_benchmark.rst:385
+#: 84c9e131c7ae44ce9f22a73afe5abf8c
+msgid "18.84"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:165
-#: 0b778c681ce54d66be31e0cd56771b23
-msgid "172.28"
+#: ../../source/benchmark/speed_benchmark.rst:385
+#: 1fc65ba82d28427d9ffad1e25cf47855
+msgid "16.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:167
-#: f10c01b0ec4949ce84cbc10634cef502
-msgid "184.58"
+#: ../../source/benchmark/speed_benchmark.rst:387
+#: d9d3a17ff48d4f62a137d86ea0d2d7b3
+msgid "25.89"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:169
-#: 24aa76bfc9e74b2da07dbd9b8d002f8f
-msgid "170.87"
+#: ../../source/benchmark/speed_benchmark.rst:387
+#: c3c05e47a2a442c3be880a0c47641c28
+msgid "9.94"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:171
-#: 84d125e1ff8c4b22a31e67956aafaccb
-msgid "166.23"
+#: ../../source/benchmark/speed_benchmark.rst:389
+#: af27bcc2ffb74337996a9daf330f6393
+msgid "19.23"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:173
-#: b25360ebc8e64572afddba261b2e0454
-msgid "164.32"
+#: ../../source/benchmark/speed_benchmark.rst:389
+#: 6a6f9c24e8864b1e9a45a98ea55cd547
+msgid "9.79"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:175
-#: 2371d6f852d9434da97e0c241fe2df5b
-msgid "174.04"
+#: ../../source/benchmark/speed_benchmark.rst:391
+#: 4d53012c3bc04013bdd4239f79f2999a
+msgid "20.51"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:177
-#: cec70be90f1246829919090565da1d04
-msgid "162.81"
+#: ../../source/benchmark/speed_benchmark.rst:391
+#: 54b11f8eea1f443696134b8190625699
+msgid "29.50"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:179
-#: 62e808f52f67419291ce162de4f75483
-msgid "83.67"
+#: ../../source/benchmark/speed_benchmark.rst:393
+#: f83b5b1af9544fc0adaa5a345984d1f3
+msgid "17.80"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:181
-#: 325a76ef42644a9e81aa6841e40d2b49
-msgid "98.63"
+#: ../../source/benchmark/speed_benchmark.rst:393
+#: d859fc3611fa4c73acf71bc6385e67c4
+msgid "17.61"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:183
-#: 9fd5160c9fba4c0885e594392be90344
-msgid "97.65"
+#: ../../source/benchmark/speed_benchmark.rst:395
+#: 3184881f5dc540de99cf1d03c0531483
+msgid "20.06"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:185
-#: ca9f28089d854da28bd96755773ad0b7
-msgid "92.48"
+#: ../../source/benchmark/speed_benchmark.rst:395
+#: 530e7ceefa3347439c9a19a4ed0c96b3
+msgid "11.36"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:187
-#: b6a87ff7de0f443faa640c5211e0895c
-msgid "77.69"
+#: ../../source/benchmark/speed_benchmark.rst:397
+#: 99a4ad03e5164cf2bdf046340c1609df
+msgid "19.21"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:189
-#: aaf0a02fd4d34de7bb47aacdd731ca6d
-msgid "86.42"
+#: ../../source/benchmark/speed_benchmark.rst:397
+#: 7b343b8a4e284ad19be19e592cf9b2fb
+msgid "11.22"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:191
-#: edf6ad7d5056492ebdfe8f6dfacf7c05
-msgid "87.49"
+#: ../../source/benchmark/speed_benchmark.rst:399
+#: 110d7d092a1347caab9e1566fbdcd886
+msgid "13.92"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:193
-#: 177861083b7b4faaaea3aed0bde2ee5e
-msgid "82.88"
+#: ../../source/benchmark/speed_benchmark.rst:401
+#: 20d86db591b34f81a37003458663f10a
+msgid "12.66"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:197
-#: 591b5d8eb7cf4e268960c3c765641bbc
-msgid "7B (Transformer)"
+#: ../../source/benchmark/speed_benchmark.rst:401
+#: 0ed4b55046b4464da3237660400a0242
+msgid "19.98"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:202
-#: ../../source/benchmark/speed_benchmark.rst:241
-#: 2425a940728e4d9ea72b078da4335782
-msgid "Qwen2-7B-Instruct"
+#: ../../source/benchmark/speed_benchmark.rst:403
+#: 7be270ba5da041618a1c08e985376fd3
+msgid "13.79"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:202
-#: 13be9dd7081145d09cf5136f67a9fe3c
-msgid "37.97"
+#: ../../source/benchmark/speed_benchmark.rst:403
+#: ../../source/benchmark/speed_benchmark.rst:495
+#: 6e07ed9db7c14005b267175081822d32
+msgid "13.81"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:202
-#: a8306c2c6628426ba629671f2b91d432
-msgid "14.92"
+#: ../../source/benchmark/speed_benchmark.rst:405
+#: bbb1220bd1714c0298ddb91667a1c716
+msgid "14.17"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:204
-#: 7d2cae030101487294be3e6005c861fc
-msgid "30.85"
+#: ../../source/benchmark/speed_benchmark.rst:405
+#: edc522c6e4514cc08e776ae31bb1e6ae
+msgid "13.67"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:204
-#: 5bda58cdaa24458faf190e38cbad13f5
-msgid "8.97"
+#: ../../source/benchmark/speed_benchmark.rst:407
+#: 5109d14859a144adbbc0b64a741a0990
+msgid "8.20"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:206
-#: 2a27fc54d87e41a9ab1e8992df16b23b
-msgid "36.17"
+#: ../../source/benchmark/speed_benchmark.rst:407
+#: e9f14eac53b044918ae403d7144158dd
+msgid "36.85"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:206
-#: ffe46abbb7824a679d02206dee3333ab
-msgid "6.06"
+#: ../../source/benchmark/speed_benchmark.rst:409
+#: 2affc3a315b544b6b0ed5ad1372a1fe3
+msgid "7.77"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:208
-#: 36df0c1fcd6a4407bfe596b1a1aa4a5b
-msgid "33.08"
+#: ../../source/benchmark/speed_benchmark.rst:409
+#: 81fee54867b346919cfac7517d85b005
+msgid "24.88"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:210
-#: ca21c086017e42c08ae8b48e38ba2e44
-msgid "34.74"
+#: ../../source/benchmark/speed_benchmark.rst:411
+#: 720a6dac666b4c938ee05ce98aa68ad3
+msgid "8.14"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:210
-#: ad871a1ca3ef4c96bdf19b804c822fb4
-msgid "20.26"
+#: ../../source/benchmark/speed_benchmark.rst:411
+#: 8e64b76bf61e4ef7a3521057cdf7a12e
+msgid "18.71"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:212
-#: 2add23066bb544ca801a8e8aaefd21f4
-msgid "31.13"
+#: ../../source/benchmark/speed_benchmark.rst:413
+#: adc199b98bf840c6b6ef0fd0740b0c6d
+msgid "8.31"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:212
-#: 5a49e504bb8349db8c995ec87262c6df
-msgid "14.31"
+#: ../../source/benchmark/speed_benchmark.rst:413
+#: 6a7087f37f914919bd19117a2e04c592
+msgid "18.57"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:214
-#: 3592401b6bfc46f68f11e99174f16644
-msgid "33.34"
+#: ../../source/benchmark/speed_benchmark.rst:417
+#: 87cc0727f7ee43efb8cec7de683f6e46
+msgid "14B (vLLM)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:214
-#: 7e085044411946c5b74535ee688e7578
-msgid "11.40"
+#: ../../source/benchmark/speed_benchmark.rst:422
+#: ../../source/benchmark/speed_benchmark.rst:634
+#: 4b4c0bd7f6a942a585380f80d256bc75
+msgid "46.30"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:216
-#: fec42d24f42a41d9873614085b1b4e31
-msgid "30.86"
+#: ../../source/benchmark/speed_benchmark.rst:424
+#: 991431ee7fc44fe3bf94cd56c5f4a4f4
+msgid "70.40"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:216
-#: b29634fb88a74077b1f1be29df2a1645
-msgid "11.27"
+#: ../../source/benchmark/speed_benchmark.rst:426
+#: 0ec9079a10dc46e6b95ac2296ef9ab0d
+msgid "98.02"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:218
-#: 9455c7d10a054c709e2ceabf6e85b819
-msgid "26.63"
+#: ../../source/benchmark/speed_benchmark.rst:428
+#: 7731361c04c242b7aeaafc926b6f50d6
+msgid "92.66"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:218
-#: fb7f29c39ecf4f408be4c6fc67debc47
-msgid "27.71"
+#: ../../source/benchmark/speed_benchmark.rst:430
+#: b0c3cfc6236c4cf391838ece8a48c0e6
+msgid "43.83"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:220
-#: 65b74ffea4554c749a6a0a3de7f676ab
-msgid "24.58"
+#: ../../source/benchmark/speed_benchmark.rst:432
+#: 920fdbd27f9a430797005289f667804a
+msgid "64.33"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:220
-#: 738cf84c7aa344bf93ee7fadba3f4309
-msgid "21.76"
+#: ../../source/benchmark/speed_benchmark.rst:434
+#: b4eb38977b024995a7a831649569532d
+msgid "86.10"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:222
-#: 73a7770b142f445eb30cb8382df8ea63
-msgid "25.81"
+#: ../../source/benchmark/speed_benchmark.rst:436
+#: 3add69ed82814a6a9fdbfc59deb9eb06
+msgid "83.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:222
-#: ca82acc2868442c8bef3498d26b0fb6a
-msgid "18.86"
+#: ../../source/benchmark/speed_benchmark.rst:438
+#: 011f7d1727634bd5a51339523d7e1e4b
+msgid "41.91"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:224
-#: 00333797cbb34fddb0640685c2c18503
-msgid "18.72"
+#: ../../source/benchmark/speed_benchmark.rst:440
+#: 2df263f68c7b436cbc81f685ef8590a0
+msgid "59.21"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:226
-#: 0ade2fd6695d4947aa5d9f7ee6691f4d
-msgid "17.49"
+#: ../../source/benchmark/speed_benchmark.rst:442
+#: 8fd9a18276454cf48e5af21eb63b91f5
+msgid "76.85"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:226
-#: 7dec10fbf5c040628de9df93e9c43799
-msgid "42.62"
+#: ../../source/benchmark/speed_benchmark.rst:444
+#: 83b581b750744226b2f42f0ecca15479
+msgid "74.03"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:228
-#: 3f53840a8d974f52a121217b307ac088
-msgid "16.69"
+#: ../../source/benchmark/speed_benchmark.rst:446
+#: c68286fe55f84456b87c26df79ba1894
+msgid "37.18"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:228
-#: 8aa4b800aadf451ab3a9a8fb57897120
-msgid "36.67"
+#: ../../source/benchmark/speed_benchmark.rst:448
+#: 83a7ac03752a402bbdf2bc83fe69d054
+msgid "49.23"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:230
-#: 0e67e5cc63804e18aceaeb5ada3a94f5
-msgid "17.17"
+#: ../../source/benchmark/speed_benchmark.rst:450
+#: 4576575e5f394ae0bc98007fd430bffe
+msgid "60.91"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:230
-#: db55bfed060249c284b3b0136de467dc
-msgid "33.76"
+#: ../../source/benchmark/speed_benchmark.rst:452
+#: 869122e329704fa294c32d1d689b17d4
+msgid "59.01"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:232
-#: 7fe02e10cb3c4d23a8c031e0428e133c
-msgid "17.87"
+#: ../../source/benchmark/speed_benchmark.rst:454
+#: 3d200274a72c4ae4b220e45156d40e40
+msgid "26.85"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:232
-#: b5692baa3c674fca950de5a1b9f0cf9e
-msgid "33.63"
+#: ../../source/benchmark/speed_benchmark.rst:456
+#: c8602005888643a9816df3d0d3f6a7f5
+msgid "32.83"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:236
-#: 6e13a225b07b4757a3bf80d9aa33cc2c
-msgid "7B (vLLM)"
+#: ../../source/benchmark/speed_benchmark.rst:458
+#: b1d59857a7d94ffabfa028fee7247413
+msgid "37.67"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:241
-#: 57234c3af122433396e3e87d36299bcb
-msgid "80.45"
+#: ../../source/benchmark/speed_benchmark.rst:460
+#: 79106824e09043d7a04261798428aa65
+msgid "36.71"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:243
-#: c5ef1e109ba5454bb955d15d7ce21818
-msgid "114.32"
+#: ../../source/benchmark/speed_benchmark.rst:462
+#: 4c82b240c59c40c0bce5e21ce8095eee
+msgid "14.53"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:245
-#: 3df1a4c3f1c44f839d10d7ac7116a2a8
-msgid "143.40"
+#: ../../source/benchmark/speed_benchmark.rst:464
+#: 7c1dcf878d8542e9b0ec91273f74de20
+msgid "15.10"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:247
-#: 3d59b820531d4ac59651d0b8b18b19fb
-msgid "96.65"
+#: ../../source/benchmark/speed_benchmark.rst:466
+#: 4d873b6a2d4448069f38d9f87857a3af
+msgid "15.13"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:249
-#: 89b50d236ae14900bdeae0d1886c3f00
-msgid "76.41"
+#: ../../source/benchmark/speed_benchmark.rst:468
+#: d7815412d81b4d858da6c984d96b4621
+msgid "15.25"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:251
-#: 60ec294ddffa41fdb80bee7d408fe785
-msgid "107.02"
+#: ../../source/benchmark/speed_benchmark.rst:476
+#: 7b6612a777c044f084bd33c1dc385fdc
+msgid "32B (Transformer)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:253
-#: 0c0b7dbd9e1a4e6b94f5fc7f0ddaad46
-msgid "131.55"
+#: ../../source/benchmark/speed_benchmark.rst:481
+#: ../../source/benchmark/speed_benchmark.rst:523
+#: 1f29ef2c5a544cb39638becddcfdcf49
+msgid "Qwen2.5-32B-Instruct"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:255
-#: 214fdc0aeede43a19e44a01625f2776b
-msgid "91.38"
+#: ../../source/benchmark/speed_benchmark.rst:481
+#: 53d8e92e364c4649bdd593b14d83e100
+msgid "17.54"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:257
-#: 8be1d5ce382b44a6812db41b83a93747
-msgid "66.54"
+#: ../../source/benchmark/speed_benchmark.rst:481
+#: e553d565e5b140f385b495303019c033
+msgid "61.58"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:259
-#: e378d26b45584f00867b6aff18a201e1
-msgid "89.72"
+#: ../../source/benchmark/speed_benchmark.rst:483
+#: 07025cd0b8394353b5a33f848fcbacca
+msgid "14.52"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:261
-#: f71fd9f2f0774608bf4ce9c1e91b9de6
-msgid "97.93"
+#: ../../source/benchmark/speed_benchmark.rst:483
+#: f6ece8fde194423b9659c7473c1b0a18
+msgid "33.56"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:263
-#: 021c3f5aa7ac48ee9ae264aa74eba998
-msgid "76.87"
+#: ../../source/benchmark/speed_benchmark.rst:485
+#: 665142835b2a4b6b8f74b32474c15856
+msgid "19.20"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:265
-#: b27fd4caaacd4cebb883d46d69213898
-msgid "55.83"
+#: ../../source/benchmark/speed_benchmark.rst:485
+#: cffbc93386f9488dba018b96ecc5581c
+msgid "18.94"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:267
-#: 8cdb1a3217bf404399522fd20eb7c4d8
-msgid "71.58"
+#: ../../source/benchmark/speed_benchmark.rst:487
+#: e5a38d880f6745159a518853d68abb93
+msgid "14.60"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:269
-#: ee2c11a873f8417b8e87a646254ae81b
-msgid "81.48"
+#: ../../source/benchmark/speed_benchmark.rst:487
+#: 07bf7781c6e946e2b1cda10f9a7aabf5
+msgid "18.67"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:271
-#: 6dd766b8464449238a924226065364ea
-msgid "63.62"
+#: ../../source/benchmark/speed_benchmark.rst:489
+#: a08a0f85da8a486f86aa44ef1d283835
+msgid "12.49"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:273
-#: ../../source/benchmark/speed_benchmark.rst:435
-#: f02b84950b1045b3962f30a0898657e6
-msgid "63488"
+#: ../../source/benchmark/speed_benchmark.rst:489
+#: 83fa297e1f9a4532842be69de2b0fe13
+msgid "63.72"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:273
-#: 48ea75f36ff24d8f8d1c276a3020bc8d
-msgid "41.20"
+#: ../../source/benchmark/speed_benchmark.rst:491
+#: e4affb2a31894be09864f4b3964fe1db
+msgid "11.61"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:275
-#: 849a3504e01d406fbe7ebe5c1df7f7f5
-msgid "49.37"
+#: ../../source/benchmark/speed_benchmark.rst:491
+#: 51490a4b4185419f81e4e04e17593d7f
+msgid "35.86"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:277
-#: 0ed4057d80274bad9344c263ffab2287
-msgid "54.12"
+#: ../../source/benchmark/speed_benchmark.rst:493
+#: b8c6f8f855c94d9aae9b499ab717d079
+msgid "13.42"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:279
-#: a6b5c183234541a5ad3d0a4477bb63fd
-msgid "45.89"
+#: ../../source/benchmark/speed_benchmark.rst:493
+#: 53c8e2e22cd34cbc9fa7c39806cb0ac3
+msgid "21.09"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:281
-#: ../../source/benchmark/speed_benchmark.rst:443
-#: d4b4a84209274040a7285fc33bf13e93
-msgid "129024"
+#: ../../source/benchmark/speed_benchmark.rst:495
+#: 8ef02e5f56a74fd690dfd100b3d3ce21
+msgid "20.81"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:281
-#: d8feeaa0ce814250bfe7706195f5e51c
-msgid "25.01"
+#: ../../source/benchmark/speed_benchmark.rst:497
+#: fde2d9af0fc942e59415a2e773262689
+msgid "8.95"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:283
-#: ../../source/benchmark/speed_benchmark.rst:399
-#: eb9cc19c7743488a83b40c84b3edfabc
-msgid "27.73"
+#: ../../source/benchmark/speed_benchmark.rst:497
+#: 332e6ffd84104fbdb0263a32e49ba96d
+msgid "67.31"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:285
-#: ae31f01fe1764ce2ae730eaaf1a20a1a
-msgid "29.39"
+#: ../../source/benchmark/speed_benchmark.rst:499
+#: cbc68892cd7c43a7b7b3b66b974b380b
+msgid "8.53"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:287
-#: aab6112852674b11b9f30d0fbdde451e
-msgid "27.13"
+#: ../../source/benchmark/speed_benchmark.rst:499
+#: 58736c51fa504f299e8e3b57431784b5
+msgid "39.28"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:291
-#: 7ecf74aaa7344de2b10d3e451117ca3a
-msgid "57B-A14B (Transformer)"
+#: ../../source/benchmark/speed_benchmark.rst:501
+#: cd0d026656d4409e9d2bb254d905897e
+msgid "9.48"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:296
-#: ../../source/benchmark/speed_benchmark.rst:318
-#: ../../source/benchmark/speed_benchmark.rst:334
-#: ../../source/benchmark/speed_benchmark.rst:338
-#: f6db37832efd438894e5a7520268a63c
-msgid "Qwen2-57B-A14B-Instruct"
+#: ../../source/benchmark/speed_benchmark.rst:501
+#: 3ee024efbab945de80cabb50963966a9
+msgid "24.67"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:296
-#: ../../source/benchmark/speed_benchmark.rst:300
-#: ../../source/benchmark/speed_benchmark.rst:304
-#: ../../source/benchmark/speed_benchmark.rst:308
-#: ../../source/benchmark/speed_benchmark.rst:318
-#: ../../source/benchmark/speed_benchmark.rst:320
-#: ../../source/benchmark/speed_benchmark.rst:322
-#: ../../source/benchmark/speed_benchmark.rst:324
-#: ../../source/benchmark/speed_benchmark.rst:350
-#: ../../source/benchmark/speed_benchmark.rst:352
-#: ../../source/benchmark/speed_benchmark.rst:358
-#: ../../source/benchmark/speed_benchmark.rst:360
-#: ../../source/benchmark/speed_benchmark.rst:368
-#: ../../source/benchmark/speed_benchmark.rst:376
-#: ../../source/benchmark/speed_benchmark.rst:378
-#: ../../source/benchmark/speed_benchmark.rst:380
-#: ../../source/benchmark/speed_benchmark.rst:389
-#: ../../source/benchmark/speed_benchmark.rst:393
-#: ../../source/benchmark/speed_benchmark.rst:397
-#: ../../source/benchmark/speed_benchmark.rst:399
-#: ../../source/benchmark/speed_benchmark.rst:403
-#: ../../source/benchmark/speed_benchmark.rst:407
-#: ../../source/benchmark/speed_benchmark.rst:409
-#: ../../source/benchmark/speed_benchmark.rst:413
-#: ../../source/benchmark/speed_benchmark.rst:415
-#: ../../source/benchmark/speed_benchmark.rst:417
-#: ../../source/benchmark/speed_benchmark.rst:421
-#: ../../source/benchmark/speed_benchmark.rst:423
-#: ../../source/benchmark/speed_benchmark.rst:425
-#: ../../source/benchmark/speed_benchmark.rst:429
-#: ../../source/benchmark/speed_benchmark.rst:431
-#: ../../source/benchmark/speed_benchmark.rst:433
-#: ../../source/benchmark/speed_benchmark.rst:437
-#: ../../source/benchmark/speed_benchmark.rst:439
-#: ../../source/benchmark/speed_benchmark.rst:441
-#: ../../source/benchmark/speed_benchmark.rst:447
-#: ../../source/benchmark/speed_benchmark.rst:449
-#: 28f5671ff9f0473bad8ecb1dc85fd4fd
-msgid "2"
+#: ../../source/benchmark/speed_benchmark.rst:503
+#: 7220fa270f3b44abb519241cfb34837d
+msgid "9.71"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:296
-#: cbe6a8135d7a499baf156fd40345740b
-msgid "4.76"
+#: ../../source/benchmark/speed_benchmark.rst:503
+#: 510e3a1a9e8c445eb3dbb3efaf9703df
+msgid "24.39"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:296
-#: ab154c4808ab43ddb8d3fefb070b94d8
-msgid "110.29"
+#: ../../source/benchmark/speed_benchmark.rst:505
+#: f28eeee663574673970f898def69eda5
+msgid "5.59"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:298
-#: 8224ca1206f64485bea889415b74ffcb
-msgid "5.55"
+#: ../../source/benchmark/speed_benchmark.rst:505
+#: 6afb3d7c36e34966b50f2d5896a30f13
+msgid "74.47"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:298
-#: f864b9292afa43d09b82cd8eaae4a7e8
-msgid "30.38"
+#: ../../source/benchmark/speed_benchmark.rst:507
+#: 6dfcb55205f14066946e98e5f21ebbb4
+msgid "5.42"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:300
-#: fb809e7ceace400b8a082acac8fd4c61
-msgid "4.90"
+#: ../../source/benchmark/speed_benchmark.rst:507
+#: f5e851a787604e9eb23881ae7a0823cf
+msgid "46.45"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:300
-#: 56194000744746b4a1e686600caf901c
-msgid "117.80"
+#: ../../source/benchmark/speed_benchmark.rst:509
+#: 3be974dc1add4a5f841c1a6615d37f55
+msgid "5.79"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:302
-#: 42beaf246b97438b8ee652873b047c52
-msgid "5.44"
+#: ../../source/benchmark/speed_benchmark.rst:509
+#: 0e8ee0a620134e9cad5a123e1f262a4e
+msgid "31.84"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:302
-#: c27750f759f24f85bef8955a43d19a98
-msgid "35.67"
+#: ../../source/benchmark/speed_benchmark.rst:511
+#: 96c5762c06af49c28c04c3dd86ab529e
+msgid "5.85"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:304
-#: 292a44b1c4f84f538a4681a1c9370a65
-msgid "4.58"
+#: ../../source/benchmark/speed_benchmark.rst:511
+#: ea0189c9871b493ea1e2fe8c76b8bf9c
+msgid "31.56"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:304
-#: 082119c4a2cf4f77a403a48e45bab6f1
-msgid "128.17"
+#: ../../source/benchmark/speed_benchmark.rst:518
+#: ad539924324348b79ae510fc747d0503
+msgid "32B (vLLM)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:306
-#: 64b7ab50e0324dd4b28a8e13165dcd50
-msgid "5.31"
+#: ../../source/benchmark/speed_benchmark.rst:523
+#: 18fbf0142b484b39bd993a5e2b7109f3
+msgid "22.13"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:306
-#: f567c71bdba14d1fbf5d6c0b9fb9b4fa
-msgid "43.11"
+#: ../../source/benchmark/speed_benchmark.rst:523
+#: ../../source/benchmark/speed_benchmark.rst:531
+#: ../../source/benchmark/speed_benchmark.rst:539
+#: ecb82e57e6174a228b0bb6c9408c857a
+msgid "setting1"
+msgstr "[设定3]"
+
+#: ../../source/benchmark/speed_benchmark.rst:525
+#: cbfd13d749b34ae8b58d71862a01a082
+msgid "37.57"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:308
-#: ../../source/benchmark/speed_benchmark.rst:366
-#: 46bce29cddac48c9ab1c86a77f39b99c
-msgid "4.12"
+#: ../../source/benchmark/speed_benchmark.rst:527
+#: dd354dc40c2b44e08ce16795be4d374d
+msgid "55.83"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:308
-#: 28b0a74ed44a4d4bb2848643d0cedb61
-msgid "163.77"
+#: ../../source/benchmark/speed_benchmark.rst:529
+#: e105e841443e475e82c1c979d5abd8da
+msgid "51.92"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:310
-#: b06bf0b700294d86b5e0b83fff8b5cdc
-msgid "4.72"
+#: ../../source/benchmark/speed_benchmark.rst:531
+#: 2c5ef86b9f184afd91b9c0813f0d2d9f
+msgid "21.05"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:310
-#: ef78f3c059084e2c851ccc57bef3fa60
-msgid "58.01"
+#: ../../source/benchmark/speed_benchmark.rst:533
+#: 8f1b7ee4927644218088c54cf274f304
+msgid "34.67"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:313
-#: a25143ba4b5b4e17a2c24ef133f808ba
-msgid "57B-A14B (vLLM)"
+#: ../../source/benchmark/speed_benchmark.rst:535
+#: 8970b876b77d43eea85a9999d7c0ec18
+msgid "49.96"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:318
-#: 0c60acb691614f19a388b98b290c5380
-msgid "31.44"
+#: ../../source/benchmark/speed_benchmark.rst:537
+#: 02d6ad974d384e42be987c076b6eb816
+msgid "46.68"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:320
-#: 2e50af3e813c47d28b950d78c92b06d6
-msgid "31.77"
+#: ../../source/benchmark/speed_benchmark.rst:539
+#: 0437b0be0e354954a91507554f9ac9bd
+msgid "19.91"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:322
-#: 670e9d878d6c4ff78cdf4f890c4fb163
-msgid "21.25"
+#: ../../source/benchmark/speed_benchmark.rst:541
+#: d48dd2bea93e4b119300d5cbf9f3817c
+msgid "31.89"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:324
-#: 05c9e0c8cf4d4eb1b7c896c748b8e40a
-msgid "20.24"
+#: ../../source/benchmark/speed_benchmark.rst:543
+#: 76a8cb67eb034e71b7cea12c30dcec9a
+msgid "44.79"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:327
-#: 3afade020cfb40c3ac2521327ceb9278
-msgid "Note: Compared with dense models, MOE models have larger throughput when batch size is large, which is shown as follows:"
-msgstr "混合专家模型 (Mixture-of-Experts, MoE) 与稠密模型相比,当批大小较大时,吞吐量更大。下表展示了有关数据:"
+#: ../../source/benchmark/speed_benchmark.rst:545
+#: 5eeed726d7584af382aef9f9b60f2568
+msgid "41.83"
+msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:330
-#: 2e819ceb9e43455ab51c447ce96a3d56
-msgid "# Prompts"
-msgstr "请求数"
+#: ../../source/benchmark/speed_benchmark.rst:547
+#: ../../source/benchmark/speed_benchmark.rst:555
+#: ../../source/benchmark/speed_benchmark.rst:563
+#: ../../source/benchmark/speed_benchmark.rst:585
+#: ../../source/benchmark/speed_benchmark.rst:587
+#: ../../source/benchmark/speed_benchmark.rst:593
+#: ../../source/benchmark/speed_benchmark.rst:595
+#: ../../source/benchmark/speed_benchmark.rst:603
+#: ../../source/benchmark/speed_benchmark.rst:611
+#: ../../source/benchmark/speed_benchmark.rst:613
+#: ../../source/benchmark/speed_benchmark.rst:615
+#: ../../source/benchmark/speed_benchmark.rst:626
+#: ../../source/benchmark/speed_benchmark.rst:630
+#: ../../source/benchmark/speed_benchmark.rst:634
+#: ../../source/benchmark/speed_benchmark.rst:636
+#: ../../source/benchmark/speed_benchmark.rst:640
+#: ../../source/benchmark/speed_benchmark.rst:644
+#: ../../source/benchmark/speed_benchmark.rst:646
+#: ../../source/benchmark/speed_benchmark.rst:650
+#: ../../source/benchmark/speed_benchmark.rst:652
+#: ../../source/benchmark/speed_benchmark.rst:654
+#: ../../source/benchmark/speed_benchmark.rst:658
+#: ../../source/benchmark/speed_benchmark.rst:660
+#: ../../source/benchmark/speed_benchmark.rst:662
+#: ../../source/benchmark/speed_benchmark.rst:666
+#: ../../source/benchmark/speed_benchmark.rst:668
+#: ../../source/benchmark/speed_benchmark.rst:670
+#: ../../source/benchmark/speed_benchmark.rst:676
+#: ../../source/benchmark/speed_benchmark.rst:678
+#: 1de1b78c409742d48003dd118a0bd0b6
+msgid "2"
+msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:330
-#: 4e979ae74bab41e38717901f7f96171c
-msgid "QPS"
-msgstr "请求每秒 (QPS)"
+#: ../../source/benchmark/speed_benchmark.rst:547
+#: c90999f095754750947c1f9754fec8b7
+msgid "31.82"
+msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:330
-#: c0c85f59ac994011a81dfcc84f7ecdae
-msgid "Tokens/s"
-msgstr "速度 (tokens/s)"
+#: ../../source/benchmark/speed_benchmark.rst:549
+#: bffb7c59dfd047a7bb148c720dfc3559
+msgid "26.88"
+msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:332
-#: ../../source/benchmark/speed_benchmark.rst:336
-#: 1776dcc8818644d58d5933f9933e4e4e
-msgid "Qwen1.5-32B-Chat"
+#: ../../source/benchmark/speed_benchmark.rst:551
+#: 05715030d07b43be973f53a11c340488
+msgid "35.66"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:332
-#: ../../source/benchmark/speed_benchmark.rst:334
-#: 0952bdd1354240ee9d60e7925c4ea42b
-msgid "100"
+#: ../../source/benchmark/speed_benchmark.rst:553
+#: d64faf27fa474f658cdb17f96a11c176
+msgid "33.75"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:332
-#: 6eae8bcf6ff946b8bea2957b3ded71be
-msgid "6.68"
+#: ../../source/benchmark/speed_benchmark.rst:555
+#: 8ee73b440af9496e99d2c9fa8c3aab47
+msgid "24.45"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:332
-#: 53f595571768489ba575c8b12fef3ec3
-msgid "7343.56"
+#: ../../source/benchmark/speed_benchmark.rst:557
+#: 8232d17592dc40879634997bb2ec8a62
+msgid "18.60"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:334
-#: f9f2397a1baf4e09814a418c5f61a592
-msgid "4.81"
+#: ../../source/benchmark/speed_benchmark.rst:559
+#: 957977d138a14fa49556b6d85ae09f98
+msgid "22.72"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:334
-#: d3cc853999fa404bbcb2f40d9ba74487
-msgid "5291.15"
+#: ../../source/benchmark/speed_benchmark.rst:561
+#: 8db19c15901344f193ffe475e337ce6d
+msgid "21.79"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:336
-#: ../../source/benchmark/speed_benchmark.rst:338
-#: c2e443593c0a4c4c8805450f7f4437c0
-msgid "1000"
+#: ../../source/benchmark/speed_benchmark.rst:563
+#: 78f2313954304662b22171c161b2545b
+msgid "14.31"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:336
-#: 3cc76ffdf7e849d0a4d8dc36eb654fdf
-msgid "7.99"
+#: ../../source/benchmark/speed_benchmark.rst:565
+#: 0b3597500eb04bee9b87863dd9a7f5cf
+msgid "9.77"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:336
-#: 8683668f120647d8a234357dfab12b47
-msgid "8791.35"
+#: ../../source/benchmark/speed_benchmark.rst:567
+#: 9da2d0f4f4674cf78f41c29f0353e94f
+msgid "10.39"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:338
-#: 4a449baa6803435ba984163e7663cc5a
-msgid "5.18"
+#: ../../source/benchmark/speed_benchmark.rst:569
+#: 0ab3a055fbb44fc2816bc11cd0bf334c
+msgid "10.34"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:338
-#: 6a62898ae87f4b3ea31c871f5eacb53a
-msgid "5698.37"
+#: ../../source/benchmark/speed_benchmark.rst:572
+#: 7ec41a4f21564751a5e286349bd2973f
+msgid "For context length 129024, the model needs to be predicted with the following config: \"model_max_length\"=131072"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:341
-#: 092e36507d1d48638cd439cefe2a9e77
-msgid "The results are obtained from vLLM throughput benchmarking scripts, which can be reproduced by:"
-msgstr "数据由vLLM吞吐量测试脚本测得,可通过以下命令复现"
+#: ../../source/benchmark/speed_benchmark.rst:573
+#: ../../source/benchmark/speed_benchmark.rst:681
+#: 6bcfd786e43e47fcbe60a1636a980420
+msgid "[Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
+msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
 
-#: ../../source/benchmark/speed_benchmark.rst:343
-#: ffdbcf34e9d64bfbac5cf88eef55cf76
-msgid "``python vllm/benchmarks/benchmark_throughput.py --input-len 1000 --output-len 100 --model <model_path> --num-prompts <number of prompts> --enforce-eager -tp 2``"
-msgstr ""
+#: ../../source/benchmark/speed_benchmark.rst:574
+#: 52ff1ba434ec4f5aa779f7c1391a4888
+msgid "[Setting 1]=(gpu_memory_utilization=1.0 max_model_len=32768 enforce_eager=True)"
+msgstr "[设定 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)"
 
-#: ../../source/benchmark/speed_benchmark.rst:345
-#: 4f799c9a507b45f0a6eeed9887fce4a9
+#: ../../source/benchmark/speed_benchmark.rst:580
+#: 35f9135bf6704976b2a4a2d1cbef1e42
 msgid "72B (Transformer)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:350
-#: ../../source/benchmark/speed_benchmark.rst:389
-#: d179bb53b2774464b254c8fc169c9125
-msgid "Qwen2-72B-Instruct"
-msgstr ""
-
-#: ../../source/benchmark/speed_benchmark.rst:350
-#: c60dad5fafab4a0ab8409ff5993fc81c
-msgid "7.45"
+#: ../../source/benchmark/speed_benchmark.rst:585
+#: ../../source/benchmark/speed_benchmark.rst:626
+#: 1ec160dc664d4ea48527cb2ee63842ae
+msgid "Qwen2.5-72B-Instruct"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:350
-#: f3bb0a3fd6874bc39ae8bf32e50c5708
-msgid "134.74"
+#: ../../source/benchmark/speed_benchmark.rst:585
+#: 0cd29021195b4deda82222fd1c5b2d89
+msgid "8.73"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:352
-#: 3a199090c9d64c259a408eaeb48d481d
-msgid "7.30"
+#: ../../source/benchmark/speed_benchmark.rst:585
+#: 4759248c3efe417cbfab580cc75d6e9b
+msgid "136.20"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:352
-#: c0ce906ec79347419857b7018238cdcb
-msgid "71.00"
+#: ../../source/benchmark/speed_benchmark.rst:587
+#: 4e5831dfca684eba87e0d9bb1e057f89
+msgid "8.66"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:354
-#: 4e60bd8a343a413080a258f4e3972257
-msgid "9.05"
+#: ../../source/benchmark/speed_benchmark.rst:587
+#: 8191363491464f02952e9df810b72dad
+msgid "72.61"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:354
-#: 95fe8e443bb34e579229392653e8f2f2
-msgid "41.80"
+#: ../../source/benchmark/speed_benchmark.rst:589
+#: ee80f21a647b48758fa4cb6e486345df
+msgid "11.07"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:356
-#: e3f3148f13334ec18395e17ee0b22e1f
-msgid "9.96"
+#: ../../source/benchmark/speed_benchmark.rst:589
+#: 21fa5feb12eb46ec89b08f36aecc5f64
+msgid "39.91"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:356
-#: b4fb66e12fe54fefb762c7cb37beb7b2
-msgid "41.31"
+#: ../../source/benchmark/speed_benchmark.rst:591
+#: eae85aa1efdd4ed189e84e5bacb0c943
+msgid "11.50"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:358
-#: 51d1ef53b0894cbda6f5876ab5f2c26b
-msgid "5.99"
+#: ../../source/benchmark/speed_benchmark.rst:591
+#: 8c37bed4817049b4bbf221e225da2c02
+msgid "39.44"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:358
-#: 5e2eed3a83fc4f468d2c8cd2e8e8ed2c
-msgid "144.38"
+#: ../../source/benchmark/speed_benchmark.rst:593
+#: 318376c8783f447092b4da185e8d6f55
+msgid "140.00"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:360
-#: 6db8fd30a9ca466b847825fdcd5cc37d
-msgid "80.60"
+#: ../../source/benchmark/speed_benchmark.rst:595
+#: 6ae597ed16c04855858e32a0f81a3318
+msgid "77.81"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:362
-#: 64b0147addbe404693cb0edb5b8cf69b
-msgid "6.79"
+#: ../../source/benchmark/speed_benchmark.rst:597
+#: 8512820f22034145bab9b4209b052870
+msgid "7.56"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:362
-#: 703be6e494414d6d980c4f4af7973c65
-msgid "47.90"
+#: ../../source/benchmark/speed_benchmark.rst:597
+#: ../../source/benchmark/speed_benchmark.rst:644
+#: 1189fab08ecf47788307b9fcd311d93f
+msgid "42.50"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:364
-#: f77a7042530646d0bbd6596dc3b04e05
-msgid "7.49"
+#: ../../source/benchmark/speed_benchmark.rst:599
+#: 4a920e9f493b4fe99016d18f1c7a335a
+msgid "8.17"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:364
-#: f547fb57b33f48f1a77dc065110be0d3
-msgid "47.42"
+#: ../../source/benchmark/speed_benchmark.rst:599
+#: 0b23f7fda490423bab9b2710090ba354
+msgid "42.13"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:366
-#: ../../source/benchmark/speed_benchmark.rst:374
-#: 3089d24bda534c67b5be05e281ad64e3
+#: ../../source/benchmark/speed_benchmark.rst:601
+#: ../../source/benchmark/speed_benchmark.rst:609
+#: 64a5def0a6ac4f68b05c68a21d7e7b73
 msgid "3"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:366
-#: ffa5f85189ff4b128cc169437949572f
-msgid "169.93"
+#: ../../source/benchmark/speed_benchmark.rst:601
+#: 3ba1fb9c27dc43948e863b2eb1373e84
+msgid "4.25"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:368
-#: e9903e0519a24d94b06fd64480743688
-msgid "4.43"
+#: ../../source/benchmark/speed_benchmark.rst:601
+#: d934eb7c7fd5409ea165c5401ddc40e3
+msgid "149.14"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:368
-#: 154c3b57143d4b92b5ee1c95e636ffe7
-msgid "95.14"
+#: ../../source/benchmark/speed_benchmark.rst:603
+#: 5ec8fd8fdce54044bac0d826833bb3c3
+msgid "4.66"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:370
-#: 19c10d1f9e284e18819df8e2180c4590
-msgid "4.87"
+#: ../../source/benchmark/speed_benchmark.rst:603
+#: 3a47334ad32942cab67ca6e033d64d33
+msgid "82.55"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:370
-#: 85574d14ce2d43748b53c5ab5cc3a05b
-msgid "57.79"
+#: ../../source/benchmark/speed_benchmark.rst:605
+#: d4e6a932094f45329dc9451c9cdbf5ce
+msgid "5.27"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:372
-#: 0d987bcce064487681d5cc4e6affd306
-msgid "5.23"
+#: ../../source/benchmark/speed_benchmark.rst:605
+#: 16b4a09390ad433982e7cce6d772ab67
+msgid "46.86"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:372
-#: a49809ae1c804efc82256b58e1c8fa5a
-msgid "57.30"
+#: ../../source/benchmark/speed_benchmark.rst:607
+#: 7290740f09ed4f5e8056bd5a73919eb1
+msgid "5.57"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:374
-#: d5e76c1fdbbc461996421d71b7a864db
-msgid "2.86"
+#: ../../source/benchmark/speed_benchmark.rst:607
+#: bb2c46fd7b4b4d9b9eee27796228d541
+msgid "46.38"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:374
-#: b4b313dbb165470ab875032d569fa2c6
-msgid "209.03"
+#: ../../source/benchmark/speed_benchmark.rst:609
+#: ../../source/benchmark/speed_benchmark.rst:611
+#: d73af4dd10734038b7ebf958929ddb3c
+msgid "2.94"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:376
-#: 254f53b31b6141478bf4c914cd3303c0
-msgid "2.83"
+#: ../../source/benchmark/speed_benchmark.rst:609
+#: 314958f218a144f6bae0e7b7c666a87b
+msgid "164.79"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:376
-#: 8c14e41ad0cc4567ba1a1bc4a5e55194
-msgid "124.20"
+#: ../../source/benchmark/speed_benchmark.rst:611
+#: b9669600e20449f19330a26d4ae24d8b
+msgid "94.75"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:378
-#: 7338224b41e04ba988aee3b919a63c19
-msgid "3.02"
+#: ../../source/benchmark/speed_benchmark.rst:613
+#: bab0c51f5f434b9daa149dd8a2dbf518
+msgid "3.14"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:378
-#: da4e3178c39a4c8284fa63428a298ece
-msgid "107.94"
+#: ../../source/benchmark/speed_benchmark.rst:613
+#: 21b5b4b6e2774b4bbf7a0161a1315215
+msgid "62.57"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:380
-#: d0c944ecc68d4c79a1cabc605aafdd3b
-msgid "1.85"
+#: ../../source/benchmark/speed_benchmark.rst:615
+#: 71d2e7a03a7648ec9b3c824d1b3e6be6
+msgid "3.23"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:380
-#: 83404c21d99d47568ada6ca97e4dc08c
-msgid "88.60"
+#: ../../source/benchmark/speed_benchmark.rst:615
+#: 8ddc63ac4ac3441d82ba48a76a168857
+msgid "61.64"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:384
-#: 01710f215568440cb8193443cb4d2d11
+#: ../../source/benchmark/speed_benchmark.rst:621
+#: 1fa49ab15da7411f9e2288fa92b39adc
 msgid "72B (vLLM)"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:387
-#: b426a4c94d9e46d0acb50f650c5aba1d
-msgid "Setting"
+#: ../../source/benchmark/speed_benchmark.rst:626
+#: 4b9dc207673447b69eeb39318d74faff
+msgid "18.19"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:389
-#: e69c240d64334418a00fd79fb5374896
-msgid "17.68"
-msgstr ""
+#: ../../source/benchmark/speed_benchmark.rst:626
+#: 8cdd46c8c8c64b749e734e82f604941d
+msgid "Setting 1"
+msgstr "[设定3]"
 
-#: ../../source/benchmark/speed_benchmark.rst:389
-#: 54209a9ed7064ea5999b591f3b581c85
-msgid "[Setting 1]"
+#: ../../source/benchmark/speed_benchmark.rst:628
+#: ../../source/benchmark/speed_benchmark.rst:638
+#: ../../source/benchmark/speed_benchmark.rst:648
+#: ../../source/benchmark/speed_benchmark.rst:656
+#: ../../source/benchmark/speed_benchmark.rst:664
+#: ../../source/benchmark/speed_benchmark.rst:672
+#: ../../source/benchmark/speed_benchmark.rst:674
+#: 6cb2bb1f8236463b82980661f2b26187
+msgid "4"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:391
-#: ../../source/benchmark/speed_benchmark.rst:401
-#: ../../source/benchmark/speed_benchmark.rst:411
-#: ../../source/benchmark/speed_benchmark.rst:419
-#: ../../source/benchmark/speed_benchmark.rst:427
-#: ../../source/benchmark/speed_benchmark.rst:435
-#: ../../source/benchmark/speed_benchmark.rst:443
-#: ../../source/benchmark/speed_benchmark.rst:445
-#: 44fd7b7e19aa411785e2db8e02443b34
-msgid "4"
+#: ../../source/benchmark/speed_benchmark.rst:628
+#: dbe95233ca094aad9620b1c5a2392f18
+msgid "31.37"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:391
-#: 8b7ff0d4a4eb4be1b88d5b41eff75c8f
-msgid "30.01"
+#: ../../source/benchmark/speed_benchmark.rst:628
+#: ../../source/benchmark/speed_benchmark.rst:630
+#: ../../source/benchmark/speed_benchmark.rst:632
+#: ../../source/benchmark/speed_benchmark.rst:636
+#: ../../source/benchmark/speed_benchmark.rst:638
+#: ../../source/benchmark/speed_benchmark.rst:640
+#: ../../source/benchmark/speed_benchmark.rst:642
+#: ../../source/benchmark/speed_benchmark.rst:646
+#: ../../source/benchmark/speed_benchmark.rst:648
+#: ../../source/benchmark/speed_benchmark.rst:650
+#: ../../source/benchmark/speed_benchmark.rst:652
+#: ../../source/benchmark/speed_benchmark.rst:654
+#: ../../source/benchmark/speed_benchmark.rst:656
+#: ../../source/benchmark/speed_benchmark.rst:658
+#: ../../source/benchmark/speed_benchmark.rst:660
+#: ../../source/benchmark/speed_benchmark.rst:662
+#: ee5778f9148049669096d6426455133f
+msgid "Default"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:393
-#: 975913f1bbec4e149dcc576aba012b16
-msgid "27.56"
+#: ../../source/benchmark/speed_benchmark.rst:630
+#: 6f55f6473c2c41a580d423528bac5899
+msgid "31.40"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:395
-#: 4a055963a1eb4098aaed27e3b21b71b2
-msgid "29.60"
+#: ../../source/benchmark/speed_benchmark.rst:632
+#: 074e04d74f81418bb8900529ee718191
+msgid "16.47"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:395
-#: 74c3cf2b89f5438ab0f20f7d36a010b2
-msgid "[Setting 2]"
+#: ../../source/benchmark/speed_benchmark.rst:634
+#: 06343656f08748a2b10154bae4376207
+msgid "Setting 2"
 msgstr "[设定2]"
 
-#: ../../source/benchmark/speed_benchmark.rst:397
-#: 03bfb2c5a69b4d3482518577c32fa39a
-msgid "42.82"
+#: ../../source/benchmark/speed_benchmark.rst:636
+#: 366ca98c00a04cfeb8a0284291994bbf
+msgid "44.30"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:401
-#: 3b8fe08697904a03bd49f267e9a8c223
-msgid "27.98"
+#: ../../source/benchmark/speed_benchmark.rst:638
+#: f43dfd3b925d4985a3efdf2e7da4ea29
+msgid "29.90"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:403
-#: dcc68256009b4048aa3bc8a1d9fa326b
-msgid "25.46"
+#: ../../source/benchmark/speed_benchmark.rst:640
+#: 64d5ac1585ac4bf4a7b862bae489164a
+msgid "29.37"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:405
-#: bca0bc1c87c5451fb0eaffc5abe9ceff
-msgid "25.16"
+#: ../../source/benchmark/speed_benchmark.rst:642
+#: 1de86b60d4c24dcca314e4818e07db24
+msgid "13.88"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:405
-#: 650de88091f441869c90042ec8034a97
-msgid "[Setting 3]"
+#: ../../source/benchmark/speed_benchmark.rst:644
+#: b204f3c52579498683f909bc570a588a
+msgid "Setting 3"
 msgstr "[设定3]"
 
-#: ../../source/benchmark/speed_benchmark.rst:407
-#: 3c0e2a52803d4d22af83fb75abbf5233
-msgid "38.23"
+#: ../../source/benchmark/speed_benchmark.rst:646
+#: c12d1b7abd0a464eacfae7d12413f732
+msgid "40.67"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:409
-#: 631ea6009552452b8fa46934e8e0f0d0
-msgid "25.77"
+#: ../../source/benchmark/speed_benchmark.rst:648
+#: aa72f60f839e4f008ba928e0c220f10e
+msgid "30.10"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:411
-#: 8bf01519422a45d8a2e84533c6a20c06
-msgid "21.81"
+#: ../../source/benchmark/speed_benchmark.rst:650
+#: 14a54dd72ce84f08a441f75db838f211
+msgid "27.20"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:413
-#: 0b688d7ae4a6418da298b6f3b12fe4b3
-msgid "22.71"
+#: ../../source/benchmark/speed_benchmark.rst:652
+#: 0d272d38659346aa85564e96df220ef6
+msgid "38.10"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:415
-#: 4b7e706fa6314752af02fbf0d293121f
-msgid "26.54"
+#: ../../source/benchmark/speed_benchmark.rst:654
+#: 9009eaa7925f4389bd0727abc502ad7a
+msgid "36.63"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:417
-#: 7c232a8ad11c47afaa0143d9db108b19
-msgid "21.50"
+#: ../../source/benchmark/speed_benchmark.rst:656
+#: 9fd7ffbf2ad24498af10753fbde5d7b3
+msgid "27.53"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:419
-#: ../../source/benchmark/speed_benchmark.rst:427
-#: 3aaf3ae8d8714b0e9c9f14031b5d97a9
-msgid "19.43"
+#: ../../source/benchmark/speed_benchmark.rst:658
+#: 14c6b382ca474a1fb361af0d5555ab54
+msgid "23.32"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:421
-#: ../../source/benchmark/speed_benchmark.rst:429
-#: 4d8b93b402b94ceba65c5da2383327f6
-msgid "18.69"
+#: ../../source/benchmark/speed_benchmark.rst:660
+#: 0a43a0a7e072497eb2ee37babdbe74ba
+msgid "30.98"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:423
-#: ../../source/benchmark/speed_benchmark.rst:431
-#: cf8c56e770ac439fb0e668dd1f4bd745
-msgid "23.12"
+#: ../../source/benchmark/speed_benchmark.rst:662
+#: 377248d5998644e5a92b276d15b2304b
+msgid "30.02"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:425
-#: ../../source/benchmark/speed_benchmark.rst:433
-#: d7b01f09c1d041029296a4d89f0685ae
-msgid "18.09"
+#: ../../source/benchmark/speed_benchmark.rst:664
+#: 2a7c8f06cdc84e39a27adbf50ec9e264
+msgid "20.74"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:435
-#: 158d7f4cd81b4bcbbcab4f470074c0a4
-msgid "17.46"
-msgstr ""
+#: ../../source/benchmark/speed_benchmark.rst:664
+#: ../../source/benchmark/speed_benchmark.rst:666
+#: ../../source/benchmark/speed_benchmark.rst:668
+#: ../../source/benchmark/speed_benchmark.rst:670
+#: 48fa6349dfa84ceb9647fe94dfb1b0f1
+msgid "Setting 4"
+msgstr "[设定3]"
 
-#: ../../source/benchmark/speed_benchmark.rst:437
-#: 70c34f305d3f4b9d931355d1d48832dd
-msgid "15.30"
+#: ../../source/benchmark/speed_benchmark.rst:666
+#: 871c974bc23d486c893d3a5609276cf1
+msgid "16.27"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:439
-#: 162b19a8a6ae46129942682f783c3606
-msgid "13.23"
+#: ../../source/benchmark/speed_benchmark.rst:668
+#: b8eb841787d74da3942f972917292c67
+msgid "19.84"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:441
-#: 455093999fa9451cae981ff12b01cd50
-msgid "13.14"
+#: ../../source/benchmark/speed_benchmark.rst:670
+#: 16a441b4182a4eb1a7d4bce1c5368516
+msgid "19.32"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:443
-#: 8411be189ad24683af4e17fc8f2ac222
-msgid "11.70"
+#: ../../source/benchmark/speed_benchmark.rst:672
+#: 9c4b458f6fd24ae8a6e623fac9b35775
+msgid "12.68"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:445
-#: 4e4131989f7f4e57b75a6f14636c3c2c
-msgid "12.94"
-msgstr ""
+#: ../../source/benchmark/speed_benchmark.rst:672
+#: ../../source/benchmark/speed_benchmark.rst:674
+#: ../../source/benchmark/speed_benchmark.rst:676
+#: ../../source/benchmark/speed_benchmark.rst:678
+#: b728aa33bc8a43d48e30bb7c6eda12c3
+msgid "Setting 5"
+msgstr "[设定3]"
 
-#: ../../source/benchmark/speed_benchmark.rst:447
-#: 65fd14035c9d401f98f9013ad13e3db3
-msgid "8.33"
+#: ../../source/benchmark/speed_benchmark.rst:674
+#: 57c759533ff04e7cb7f6ceb027d80692
+msgid "14.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:449
-#: e6065cf3a98d4dc5be1c8ac43c759743
-msgid "7.78"
+#: ../../source/benchmark/speed_benchmark.rst:676
+#: 8fdea9ad04aa4608b5f839f3d923205c
+msgid "10.11"
 msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:452
-#: 984f697c77ad416480978c2e703d3281
-msgid "[Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
-msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
+#: ../../source/benchmark/speed_benchmark.rst:678
+#: 6a52a106dc724a8285a38b543acda27e
+msgid "9.88"
+msgstr ""
 
-#: ../../source/benchmark/speed_benchmark.rst:453
-#: 1a3dd3717c5b4dcebbd12bcfe0fbc33e
+#: ../../source/benchmark/speed_benchmark.rst:682
+#: 7f20ec685e1b4e0eb0a606369efb844d
 msgid "[Setting 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)"
 msgstr "[设定 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)"
 
-#: ../../source/benchmark/speed_benchmark.rst:454
-#: ff06cfda1923459a9fbc40c68aa8359a
+#: ../../source/benchmark/speed_benchmark.rst:683
+#: a750af24c71b4094994ccf9dc8b87986
 msgid "[Setting 2]=(gpu_memory_utilization=1.0 max_model_len=4096 enforce_eager=True)"
 msgstr "[设定 2]=(gpu_memory_utilization=1.0 max_model_len=4096 enforce_eager=True)"
 
-#: ../../source/benchmark/speed_benchmark.rst:455
-#: adb3086724db4afa97ba526f69bd15b3
+#: ../../source/benchmark/speed_benchmark.rst:684
+#: 8fd1c1acd99f4f7eb4766030a56823e5
 msgid "[Setting 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)"
 msgstr "[设定 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)"
 
+#: ../../source/benchmark/speed_benchmark.rst:685
+#: 70b88bc91b9848bdb4901f8490275db4
+msgid "[Setting 4]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)"
+msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
+
+#: ../../source/benchmark/speed_benchmark.rst:686
+#: 6eed31e9316b4d64b8c1d105884524fc
+msgid "[Setting 5]=(gpu_memory_utilization=0.9 max_model_len=131072 enforce_eager=False)"
+msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)"
diff --git a/docs/source/benchmark/speed_benchmark.rst b/docs/source/benchmark/speed_benchmark.rst
index 297cf717e3ae22e90fd30903be3844937871e59e..8585362e778f33e544c83ecb2d0daa92760cbcbf 100644
--- a/docs/source/benchmark/speed_benchmark.rst
+++ b/docs/source/benchmark/speed_benchmark.rst
@@ -11,21 +11,21 @@ The environment of the evaluation with huggingface transformers is:
 
 -  NVIDIA A100 80GB
 -  CUDA 12.1
--  torch==2.3.1
--  flash_attn==2.5.8
--  transformers==4.46.0
--  auto_gptq==0.7.1+cu1210 (Compiled from source code)
--  autoawq==0.2.6
+-  Pytorch 2.3.1
+-  Flash Attention 2.5.8
+-  Transformers 4.46.0
+-  AutoGPTQ 0.7.1+cu121 (Compiled from source code)
+-  AutoAWQ 0.2.6
 
 
 The environment of the evaluation with vLLM is:
 
 -  NVIDIA A100 80GB
 -  CUDA 12.1
--  vllm==0.6.3
--  torch==2.4.0
--  flash_attn==2.6.3
--  transformers==4.46.0
+-  vLLM 0.6.3
+-  Pytorch 2.4.0
+-  Flash Attention 2.6.3
+-  Transformers 4.46.0
 
 
 Notes:
@@ -43,42 +43,41 @@ Notes:
 
 -  0.5B (Transformer)
 
-+-------------------------+--------------+--------------+---------+-----------------+----------------+
-| Model                   | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) |
-+=========================+==============+==============+=========+=================+================+
-| Qwen2.5-0.5B-Instruct   | 1            | BF16         | 1       | 47.40           | 0.97           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int8    | 1       | 35.17           | 0.64           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int4    | 1       | 50.60           | 0.48           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | AWQ          | 1       | 37.09           | 0.68           |
-+                         +--------------+--------------+---------+-----------------+----------------+
-|                         | 6144         | BF16         | 1       | 47.45           | 1.23           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int8    | 1       | 36.47           | 0.90           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int4    | 1       | 48.89           | 0.73           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | AWQ          | 1       | 37.04           | 0.72           |
-+                         +--------------+--------------+---------+-----------------+----------------+
-|                         | 14336        | BF16         | 1       | 47.11           | 1.60           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int8    | 1       | 35.44           | 1.26           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int4    | 1       | 48.26           | 1.10           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | AWQ          | 1       | 37.14           | 1.10           |
-+                         +--------------+--------------+---------+-----------------+----------------+
-|                         | 30720        | BF16         | 1       | 47.16           | 2.34           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int8    | 1       | 36.25           | 2.01           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | GPTQ-Int4    | 1       | 49.22           | 1.85           |
-+                         +              +--------------+---------+-----------------+----------------+
-|                         |              | AWQ          | 1       | 36.90           | 1.84           |
-+-------------------------+--------------+--------------+---------+-----------------+----------------+
-
++-------------------------+--------------+--------------+---------+-----------------+----------------+---------------------------+
+| Model                   | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                      |
++=========================+==============+==============+=========+=================+================+===========================+
+| Qwen2.5-0.5B-Instruct   | 1            | BF16         | 1       | 47.40           | 0.97           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int8    | 1       | 35.17           | 0.64           | auto_gptq==0.6.0+cu1210   |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int4    | 1       | 50.60           | 0.48           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | AWQ          | 1       | 37.09           | 0.68           |                           |
++                         +--------------+--------------+---------+-----------------+----------------+---------------------------+
+|                         | 6144         | BF16         | 1       | 47.45           | 1.23           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int8    | 1       | 36.47           | 0.90           | auto_gptq==0.6.0+cu1210   |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int4    | 1       | 48.89           | 0.73           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | AWQ          | 1       | 37.04           | 0.72           |                           |
++                         +--------------+--------------+---------+-----------------+----------------+---------------------------+
+|                         | 14336        | BF16         | 1       | 47.11           | 1.60           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int8    | 1       | 35.44           | 1.26           | auto_gptq==0.6.0+cu1210   |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int4    | 1       | 48.26           | 1.10           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | AWQ          | 1       | 37.14           | 1.10           |                           |
++                         +--------------+--------------+---------+-----------------+----------------+---------------------------+
+|                         | 30720        | BF16         | 1       | 47.16           | 2.34           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int8    | 1       | 36.25           | 2.01           | auto_gptq==0.6.0+cu1210   |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | GPTQ-Int4    | 1       | 49.22           | 1.85           |                           |
++                         +              +--------------+---------+-----------------+----------------+---------------------------+
+|                         |              | AWQ          | 1       | 36.90           | 1.84           |                           |
++-------------------------+--------------+--------------+---------+-----------------+----------------+---------------------------+
 
 
 -  0.5B (vLLM)
@@ -124,7 +123,7 @@ Notes:
 -  1.5B (Transformer)
 
 +--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
-| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
 +==========================+==============+==============+=========+=================+================+=========================+
 | Qwen2.5-1.5B-Instruct    | 1            | BF16         | 1       | 39.68           | 2.95           |                         |
 +                          +              +--------------+---------+-----------------+----------------+-------------------------+
@@ -203,7 +202,7 @@ Notes:
 -  3B (Transformer)
 
 +--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
-| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
 +==========================+==============+==============+=========+=================+================+=========================+
 | Qwen2.5-3B-Instruct      | 1            | BF16         | 1       | 30.80           | 5.95           |                         |
 +                          +              +--------------+---------+-----------------+----------------+-------------------------+
@@ -282,7 +281,7 @@ Notes:
 -  7B (Transformer)
 
 +-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
-| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
 +=============================+==============+==============+=========+=================+================+=========================+
 | Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 40.38           | 14.38          |                         |
 +                             +              +--------------+---------+-----------------+----------------+-------------------------+
@@ -321,63 +320,65 @@ Notes:
 
 -  7B (vLLM)
 
-+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB)| Note                                      |
-+=============================+==============+==============+=========+=================+================+===========================================+
-| Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 84.28           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 122.01          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 154.05          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 148.10          |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 6144         | BF16         | 1       | 80.70           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 112.38          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 141.98          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 137.64          |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 14336        | BF16         | 1       | 77.69           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 105.25          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 129.35          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 124.91          |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 30720        | BF16         | 1       | 70.33           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 90.71           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 108.30          |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 104.66          |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 63488        | BF16         | 1       | 50.86           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 60.52           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 67.97           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 66.42           |                | setting-64k                               |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 129024       | BF16         | 1       | 28.94           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 25.97           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 26.37           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 26.57           |                | vllm==0.6.2, new sample config            |
-+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-  * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
++=============================+==============+==============+=========+=================+===========================================+
+| Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 84.28           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 122.01          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 154.05          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 148.10          |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 80.70           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 112.38          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 141.98          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 137.64          |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 77.69           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 105.25          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 129.35          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 124.91          |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 1       | 70.33           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 90.71           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 108.30          |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 104.66          |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 63488        | BF16         | 1       | 50.86           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 60.52           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 67.97           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 66.42           | setting-64k                               |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 129024       | BF16         | 1       | 28.94           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 25.97           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 26.37           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 26.57           | vllm==0.6.2, new sample config            |
++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
+
+* [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+* [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)
 
 - 14B (Transformer)
 
 +--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
-| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB) | Note                   |
+| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
 +==========================+==============+==============+=========+=================+================+=========================+
 | Qwen2.5-14B-Instruct     | 1            | BF16         | 1       | 24.74           | 28.08          |                         |
 +                          +              +--------------+---------+-----------------+----------------+-------------------------+
@@ -415,58 +416,60 @@ Notes:
 
 - 14B (vLLM)
 
-+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s)  | GPU Memory(GB)| Note                                      |
-+=============================+==============+==============+=========+=================+================+===========================================+
-| Qwen2.5-14B-Instruct        | 1            | BF16         | 1       | 46.30           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 70.40           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 98.02           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 92.66           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 6144         | BF16         | 1       | 43.83           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 64.33           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 86.10           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 83.11           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 14336        | BF16         | 1       | 41.91           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 59.21           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 76.85           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 74.03           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 30720        | BF16         | 1       | 37.18           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 49.23           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 60.91           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 59.01           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 63488        | BF16         | 1       | 26.85           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 32.83           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 37.67           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 36.71           |                | setting-64k                               |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 129024       | BF16         | 1       | 14.53           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 15.10           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 15.13           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 15.25           |                | vllm==0.6.2, new sample config            |
-+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-  * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
++=============================+==============+==============+=========+=================+===========================================+
+| Qwen2.5-14B-Instruct        | 1            | BF16         | 1       | 46.30           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 70.40           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 98.02           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 92.66           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 43.83           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 64.33           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 86.10           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 83.11           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 41.91           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 59.21           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 76.85           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 74.03           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 1       | 37.18           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 49.23           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 60.91           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 59.01           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 63488        | BF16         | 1       | 26.85           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 32.83           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 37.67           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 36.71           | setting-64k                               |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 129024       | BF16         | 1       | 14.53           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 15.10           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 15.13           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 15.25           | vllm==0.6.2, new sample config            |
++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
+
+* [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+* [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)
 
 
 
@@ -514,62 +517,63 @@ Notes:
 
 - 32B (vLLM)
 
-+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
-+=============================+==============+==============+=========+=================+================+===========================================+
-| Qwen2.5-32B-Instruct        | 1            | BF16         | 1       | 22.13           |                | setting1                                  |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 37.57           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 55.83           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 51.92           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 6144         | BF16         | 1       | 21.05           |                | setting1                                  |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 34.67           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 49.96           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 46.68           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 14336        | BF16         | 1       | 19.91           |                | setting1                                  |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 31.89           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 44.79           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 41.83           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 30720        | BF16         | 2       | 31.82           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 26.88           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 35.66           |                |                                           |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 33.75           |                |                                           |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 63488        | BF16         | 2       | 24.45           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 18.60           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 22.72           |                | setting-64k                               |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 21.79           |                | setting-64k                               |
-+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             | 129024       | BF16         | 2       | 14.31           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int8    | 1       | 9.77            |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | GPTQ-Int4    | 1       | 10.39           |                | vllm==0.6.2, new sample config            |
-+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                             |              | AWQ          | 1       | 10.34           |                | vllm==0.6.2, new sample config            |
-+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
+| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
++=============================+==============+==============+=========+=================+===========================================+
+| Qwen2.5-32B-Instruct        | 1            | BF16         | 1       | 22.13           | setting1                                  |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 37.57           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 55.83           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 51.92           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 6144         | BF16         | 1       | 21.05           | setting1                                  |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 34.67           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 49.96           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 46.68           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 14336        | BF16         | 1       | 19.91           | setting1                                  |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 31.89           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 44.79           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 41.83           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 30720        | BF16         | 2       | 31.82           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 26.88           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 35.66           |                                           |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 33.75           |                                           |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 63488        | BF16         | 2       | 24.45           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 18.60           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 22.72           | setting-64k                               |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 21.79           | setting-64k                               |
++                             +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                             | 129024       | BF16         | 2       | 14.31           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int8    | 1       | 9.77            | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | GPTQ-Int4    | 1       | 10.39           | vllm==0.6.2, new sample config            |
++                             +              +--------------+---------+-----------------+-------------------------------------------+
+|                             |              | AWQ          | 1       | 10.34           | vllm==0.6.2, new sample config            |
++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
 
   * For context length 129024, the model needs to be predicted with the following config: "model_max_length"=131072
   * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)
   * [Setting 1]=(gpu_memory_utilization=1.0 max_model_len=32768 enforce_eager=True)
   * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
+  * [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)
 
 
 
@@ -616,63 +620,63 @@ Notes:
 
 - 72B (vLLM)
 
-+------------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-| Model                        | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
-+==============================+==============+==============+=========+=================+================+===========================================+
-| Qwen2.5-72B-Instruct         | 1            | BF16         | 2       | 18.19           |                | Setting 1                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | BF16         | 4       | 31.37           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int8    | 2       | 31.40           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 1       | 16.47           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 2       | 46.30           |                | Setting 2                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | AWQ          | 2       | 44.30           |                | Default                                   |
-+                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              | 6144         | BF16         | 4       | 29.90           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int8    | 2       | 29.37           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 1       | 13.88           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 2       | 42.50           |                | Setting 3                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | AWQ          | 2       | 40.67           |                | Default                                   |
-+                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              | 14336        | BF16         | 4       | 30.10           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int8    | 2       | 27.20           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 2       | 38.10           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | AWQ          | 2       | 36.63           |                | Default                                   |
-+                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              | 30720        | BF16         | 4       | 27.53           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int8    | 2       | 23.32           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 2       | 30.98           |                | Default                                   |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | AWQ          | 2       | 30.02           |                | Default                                   |
-+                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              | 63488        | BF16         | 4       | 20.74           |                | Setting 4                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int8    | 2       | 16.27           |                | Setting 4                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 2       | 19.84           |                | Setting 4                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | AWQ          | 2       | 19.32           |                | Setting 4                                 |
-+                              +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              | 129024       | BF16         | 4       | 12.68           |                | Setting 5                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int8    | 4       | 14.11           |                | Setting 5                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | GPTQ-Int4    | 2       | 10.11           |                | Setting 5                                 |
-+                              +              +--------------+---------+-----------------+----------------+-------------------------------------------+
-|                              |              | AWQ          | 2       | 9.88            |                | Setting 5                                 |
-+------------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
++------------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
+| Model                        | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
++==============================+==============+==============+=========+=================+===========================================+
+| Qwen2.5-72B-Instruct         | 1            | BF16         | 2       | 18.19           | Setting 1                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | BF16         | 4       | 31.37           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 31.40           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 1       | 16.47           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 46.30           | Setting 2                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 44.30           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              | 6144         | BF16         | 4       | 29.90           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 29.37           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 1       | 13.88           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 42.50           | Setting 3                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 40.67           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              | 14336        | BF16         | 4       | 30.10           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 27.20           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 38.10           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 36.63           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              | 30720        | BF16         | 4       | 27.53           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 23.32           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 30.98           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 30.02           | Default                                   |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              | 63488        | BF16         | 4       | 20.74           | Setting 4                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 2       | 16.27           | Setting 4                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 19.84           | Setting 4                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 19.32           | Setting 4                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              | 129024       | BF16         | 4       | 12.68           | Setting 5                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int8    | 4       | 14.11           | Setting 5                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | GPTQ-Int4    | 2       | 10.11           | Setting 5                                 |
++                              +--------------+--------------+---------+-----------------+-------------------------------------------+
+|                              |              | AWQ          | 2       | 9.88            | Setting 5                                 |
++------------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
 
   * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)
   * [Setting 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)