diff --git a/docs/source/quantization/gptq.md b/docs/source/quantization/gptq.md
index 4faa3a982c2337bc9d1b92cf90a472ea42518dfb..a3b050d394d0ac64c0eb753191e115691871259a 100644
--- a/docs/source/quantization/gptq.md
+++ b/docs/source/quantization/gptq.md
@@ -5,6 +5,16 @@ In this document, we show you how to use the quantized model with Hugging Face `
 
 ## Usage of GPTQ Models with Hugging Face transformers
 
+:::{note}
+
+To use the official Qwen2.5 GPTQ models with `transformers`, please ensure that `optimum>=1.20.0` and compatible versions of `transformers` and `auto_gptq` are installed.
+
+You can do that by 
+```bash
+pip install -U "optimum>=1.20.0"
+```
+:::
+
 Now, `transformers` has officially supported AutoGPTQ, which means that you can directly use the quantized model with `transformers`. 
 For each size of Qwen2.5, we provide both Int4 and Int8 GPTQ quantized models.
 The following is a very simple code snippet showing how to run `Qwen2.5-7B-Instruct-GPTQ-Int4`:
@@ -204,6 +214,29 @@ For sharding, you need to load the model and use `save_pretrained` from transfor
 Except for this, everything is so simple. 
 Enjoy!
 
+
+## Known Issues
+
+### Qwen2.5-72B-Instruct-GPTQ-Int4 cannot stop generation properly
+
+:Model: Qwen2.5-72B-Instruct-GPTQ-Int4
+:Framework: vLLM, AutoGPTQ (including Hugging Face transformers)
+:Description: Generation cannot stop properly. Continual generation after where it should stop, then repeated texts, either single character, a phrase, or paragraphs, are generated.
+:Workaround: The following workaround could be considered
+    1. Using the original model in 16-bit floating point
+    2. Using the AWQ variants or llama.cpp-based models for reduced chances of abnormal generation
+
+### Qwen2.5-32B-Instruct-GPTQ-Int4 broken with vLLM on multiple GPUs
+
+:Model: Qwen2.5-32B-Instruct-GPTQ-Int4
+:Framework: vLLM
+:Description: Deployment on multiple GPUs and only garbled text like `!!!!!!!!!!!!!!!!!!` could be generated.
+:Workaround: Each of the following workaround could be considered
+    1. Using the AWQ or GPTQ-Int8 variants
+    2. Using a single GPU
+    3. Using Hugging Face `transformers` if latency and throughput are not major concerns
+
+
 ## Troubleshooting
 
 :::{dropdown} With `transformers` and `auto_gptq`, the logs suggest `CUDA extension not installed.` and the inference is slow.