Rebuild docs for speed benchmark (#1045)

* add qwen2.5 perf report * update readme * rebuild docs and fix format issue * remove fuzzy in speed_benchmark.po * fix issue * recover function_call.po * update * remove unused code in speed_benchmark.po

Rebuild docs for speed benchmark (#1045)
* add qwen2.5 perf report * update readme * rebuild docs and fix format issue * remove fuzzy in speed_benchmark.po * fix issue * recover function_call.po * update * remove unused code in speed_benchmark.po
a912d239 · Xingjun.Wang · GitHub · 0f0ecfba · a912d239 · a912d239
Unverified Commit a912d239 authored 7 months ago by Xingjun.Wang Committed by GitHub 7 months ago
--- a/README.md
+++ b/README.md
@@ -22,7 +22,7 @@ To learn more about Qwen2.5, feel free to read our documentation \[[EN](https://
 - Quantization: the practice of quantizing LLMs with GPTQ, AWQ, as well as the guidance for how to make high-quality quantized GGUF files;
 - Training: the instructions for post-training, including SFT and RLHF (TODO) with frameworks like Axolotl, LLaMA-Factory, etc.
 - Framework: the usage of Qwen with frameworks for application, e.g., RAG, Agent, etc.
- Benchmark: the statistics about inference speed and memory footprint (to be updated for Qwen2.5).
+- Benchmark: the statistics about inference speed and memory footprint (Available for Qwen2.5).

 ## Introduction

@@ -37,7 +37,7 @@ In the past three months since Qwen2's release, numerous developers have built n

 ## News

- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more! 
+- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more!
 - 2024.06.06: We released the Qwen2 series. Check our [blog](https://qwenlm.github.io/blog/qwen2/)!
 - 2024.03.28: We released the first MoE model of Qwen: Qwen1.5-MoE-A2.7B! Temporarily, only HF transformers and vLLM support the model. We will soon add the support of llama.cpp, mlx-lm, etc. Check our [blog](https://qwenlm.github.io/blog/qwen-moe/) for more information!
 - 2024.02.05: We released the Qwen1.5 series.
@@ -46,7 +46,7 @@ In the past three months since Qwen2's release, numerous developers have built n

 Detailed evaluation results are reported in this <a href="https://qwenlm.github.io/blog/qwen2.5/"> 📑 blog</a>.

-For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) (to be updated for Qwen2.5).
+For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) .

 ## Quickstart


--- a/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po
+++ b/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po
--- a/docs/source/benchmark/speed_benchmark.rst
+++ b/docs/source/benchmark/speed_benchmark.rst