diff --git a/README.md b/README.md
index b6f99135e600828aa36e6575213f9975e7cdc2ab..26671653976e552d81346c1c8f7b301eb40306a6 100644
--- a/README.md
+++ b/README.md
@@ -168,7 +168,7 @@ Clone [`llamafile`](https://github.com/Mozilla-Ocho/llamafile), run source insta
 
 ## Deployment
 
-Qwen2.5 is supported by multiple inference frameworks. Here we demonstrate the usage of `vLLM` and `SGLang`.
+Qwen2.5 is supported by multiple inference frameworks. Here we demonstrate the usage of `vLLM`, `SGLang` and `OpenLLM`.
 
 ### vLLM
 
@@ -254,6 +254,16 @@ for m in state.messages():
 print(state["answer_1"])
 ```
 
+### OpenLLM
+
+[OpenLLM](https://github.com/bentoml/OpenLLM) allows you to easily runÂ Qwen2.5 as OpenAI-compatible APIs. You can start a model server using `openllm serve`. For example:
+
+```bash
+openllm serve qwen2.5:7b
+```
+
+The server is active at `http://localhost:3000/`, providing OpenAI-compatible APIs. You can create an OpenAI client to call its chat API. For more information, refer to [our documentation](https://qwen.readthedocs.io/en/latest/deployment/openllm.html).
+
 ### Tool Use
 
 For tool use capabilities, we recommend taking a look at [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent), which provides a wrapper around these APIs to support tool use or function calling.
diff --git a/docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po b/docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po
new file mode 100644
index 0000000000000000000000000000000000000000..0a20756441ae8fc08d3cbbfa26ec2b1fb020effd
--- /dev/null
+++ b/docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po
@@ -0,0 +1,105 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2024, Qwen Team
+# This file is distributed under the same license as the Qwen package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: Qwen \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2024-10-21 10:15+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.12.1\n"
+
+#: ../../source/deployment/openllm.rst:2 9f00d55ade714c18b2164e565a34ca40
+msgid "OpenLLM"
+msgstr "OpenLLM"
+
+#: ../../source/deployment/openllm.rst:4 0d9f8589af474e3ba036df93871a9d4c
+msgid "OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository <https://github.com/bentoml/OpenLLM/>`_ to learn more."
+msgstr "OpenLLM å…è®¸å¼€å‘è€…é€šè¿‡ä¸€ä¸ªå‘½ä»¤è¿è¡Œä¸åŒå¤§å°çš„ Qwen2.5 æ¨¡åž‹ï¼Œæä¾› OpenAI å…¼å®¹çš„ APIã€‚å®ƒå…·æœ‰å†…ç½®çš„èŠå¤© UIï¼Œå…ˆè¿›çš„æŽ¨ç†åŽç«¯ï¼Œä»¥åŠç®€åŒ–çš„å·¥ä½œæµç¨‹æ¥ä½¿ç”¨ Qwen2.5 åˆ›å»ºä¼ä¸šçº§äº‘éƒ¨ç½²ã€‚è®¿é—® `OpenLLM ä»“åº“ <https://github.com/bentoml/OpenLLM/>`_ äº†è§£æ›´å¤šä¿¡æ¯ã€‚"
+
+#: ../../source/deployment/openllm.rst:7 c2d3e721de5d4a42b42654aaeec00fb9
+msgid "Installation"
+msgstr "å®‰è£…"
+
+#: ../../source/deployment/openllm.rst:9 258e40bb2c4d4b0b8a7ab002111f2ff7
+msgid "Install OpenLLM using ``pip``."
+msgstr "ä½¿ç”¨ ``pip`` å®‰è£… OpenLLMã€‚"
+
+#: ../../source/deployment/openllm.rst:15 51e9e7ea8fd14537b9573e1adfd6358a
+msgid "Verify the installation and display the help information:"
+msgstr "éªŒè¯å®‰è£…å¹¶æ˜¾ç¤ºå¸®åŠ©ä¿¡æ¯ï¼š"
+
+#: ../../source/deployment/openllm.rst:22 854205ed7929464ca8aa4138b5b03f46
+msgid "Quickstart"
+msgstr "å¿«é€Ÿå¼€å§‹"
+
+#: ../../source/deployment/openllm.rst:24 cd1086ff08b5402dbfdd3721b4454a7b
+msgid "Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository."
+msgstr "åœ¨è¿è¡Œä»»ä½• Qwen2.5 æ¨¡åž‹ä¹‹å‰ï¼Œç¡®ä¿æ‚¨çš„æ¨¡åž‹ä»“åº“ä¸Ž OpenLLM çš„æœ€æ–°å®˜æ–¹ä»“åº“åŒæ¥ã€‚"
+
+#: ../../source/deployment/openllm.rst:30 e48b1b0b3ef24ebf8344f1e8e7128026
+msgid "List the supported Qwen2.5 models:"
+msgstr "åˆ—å‡ºæ”¯æŒçš„ Qwen2.5 æ¨¡åž‹ï¼š"
+
+#: ../../source/deployment/openllm.rst:36 59f91e90b1264a749c6182f1e8da9070
+msgid "The results also display the required GPU resources and supported platforms:"
+msgstr "ç»“æžœè¿˜ä¼šæ˜¾ç¤ºæ‰€éœ€çš„ GPU èµ„æºå’Œæ”¯æŒçš„å¹³å°ï¼š"
+
+#: ../../source/deployment/openllm.rst:54 5dcf3a784bd042448f0a122818c52395
+msgid "To start a server with one of the models, use ``openllm serve`` like this:"
+msgstr "è¦ä½¿ç”¨å…¶ä¸ä¸€ä¸ªæ¨¡åž‹æ¥å¯åŠ¨æœåŠ¡å™¨ï¼Œè¯·ä½¿ç”¨ ``openllm serve`` å‘½ä»¤ï¼Œä¾‹å¦‚ï¼š"
+
+#: ../../source/deployment/openllm.rst:60 e637a01782a041cbbb044ed0c5bfb48f
+msgid "By default, the server starts at ``http://localhost:3000/``."
+msgstr "é»˜è®¤æƒ…å†µä¸‹ï¼ŒæœåŠ¡å™¨å¯åŠ¨åœ¨ http://localhost:3000/ã€‚"
+
+#: ../../source/deployment/openllm.rst:63 8457ac65af3f4ec5853f0b8a0a725708
+msgid "Interact with the model server"
+msgstr "ä¸Žæ¨¡åž‹æœåŠ¡å™¨äº¤äº’"
+
+#: ../../source/deployment/openllm.rst:65 6f0aa0deae1a4de0b694d35bbef3c5c7
+msgid "With the model server up and running, you can call its APIs in the following ways:"
+msgstr "æœåŠ¡å™¨è¿è¡ŒåŽï¼Œå¯ä»¥é€šè¿‡ä»¥ä¸‹æ–¹å¼è°ƒç”¨å…¶ APIï¼š"
+
+#: ../../source/deployment/openllm.rst 80e56ac95bab4c478bef739c854b17a2
+msgid "CURL"
+msgstr "CURL"
+
+#: ../../source/deployment/openllm.rst:71 781c1253b9f8482fb3371b66819f4273
+msgid "Send an HTTP request to its ``/generate`` endpoint via CURL:"
+msgstr "é€šè¿‡ CURL å‘å…¶ ``/generate`` ç«¯ç‚¹å‘é€ HTTP è¯·æ±‚ï¼š"
+
+#: ../../source/deployment/openllm.rst cec41082c62447d2b78b8f6a59305d9e
+msgid "Python client"
+msgstr "Python å®¢æˆ·ç«¯"
+
+#: ../../source/deployment/openllm.rst:88 3c1852f6303c40fbb7c7f1ded4bf2313
+msgid "Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example:"
+msgstr "ä½¿ç”¨æ”¯æŒ OpenAI API åè®®çš„æ¡†æž¶å’Œå·¥å…·æ¥è°ƒç”¨ã€‚ä¾‹å¦‚ï¼š"
+
+#: ../../source/deployment/openllm.rst 4b9859905ccf4198960623fb08729f48
+msgid "Chat UI"
+msgstr "èŠå¤© UI"
+
+#: ../../source/deployment/openllm.rst:115 2bd4d8290f0546c3b22ca985ecfa9d13
+msgid "OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat."
+msgstr "OpenLLM ä¸º LLM æœåŠ¡å™¨æä¾›çš„èŠå¤© UI ä½äºŽ ``/chat`` ç«¯ç‚¹ï¼Œåœ°å€ä¸º http://localhost:3000/chatã€‚"
+
+#: ../../source/deployment/openllm.rst:120 57ce35b171cd42088ed23be0862611b3
+msgid "Model repository"
+msgstr "æ¨¡åž‹ä»“åº“"
+
+#: ../../source/deployment/openllm.rst:122 3a642d5e0c4c4a7484d05eebfdf0a3ac
+msgid "A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_."
+msgstr "OpenLLM ä¸çš„æ¨¡åž‹ä»“åº“è¡¨ç¤ºå¯ç”¨çš„ LLM ç›®å½•ã€‚æ‚¨å¯ä»¥ä¸º OpenLLM æ·»åŠ è‡ªå®šä¹‰çš„ Qwen2.5 æ¨¡åž‹ä»“åº“ï¼Œä»¥æ»¡è¶³æ‚¨çš„ç‰¹å®šéœ€æ±‚ã€‚è¯·å‚é˜… `æˆ‘ä»¬çš„æ–‡æ¡£ <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_ äº†è§£è¯¦ç»†ä¿¡æ¯ã€‚"
+
diff --git a/docs/source/assets/qwen-openllm-ui-demo.png b/docs/source/assets/qwen-openllm-ui-demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..f887cde12cfe48c594944cc0066d23954088f457
Binary files /dev/null and b/docs/source/assets/qwen-openllm-ui-demo.png differ
diff --git a/docs/source/deployment/openllm.rst b/docs/source/deployment/openllm.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b505cdac3c818f26d2eab9560b9d590ba27202f3
--- /dev/null
+++ b/docs/source/deployment/openllm.rst
@@ -0,0 +1,122 @@
+OpenLLM
+=======
+
+OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository <https://github.com/bentoml/OpenLLM/>`_ to learn more.
+
+Installation
+------------
+
+Install OpenLLM using ``pip``.
+
+.. code:: bash
+
+   pip install openllm
+
+Verify the installation and display the help information:
+
+.. code:: bash
+
+   openllm --help
+
+Quickstart
+----------
+
+Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository.
+
+.. code:: bash
+
+   openllm repo update
+
+List the supported Qwen2.5 models:
+
+.. code:: bash
+
+   openllm model list --tag qwen2.5
+
+The results also display the required GPU resources and supported platforms:
+
+.. code:: bash
+
+   model    version                repo     required GPU RAM    platforms
+   -------  ---------------------  -------  ------------------  -----------
+   qwen2.5  qwen2.5:0.5b           default  12G                 linux
+            qwen2.5:1.5b           default  12G                 linux
+            qwen2.5:3b             default  12G                 linux
+            qwen2.5:7b             default  24G                 linux
+            qwen2.5:14b            default  80G                 linux
+            qwen2.5:14b-ggml-q4    default                      macos
+            qwen2.5:14b-ggml-q8    default                      macos
+            qwen2.5:32b            default  80G                 linux
+            qwen2.5:32b-ggml-fp16  default                      macos
+            qwen2.5:72b            default  80Gx2               linux
+            qwen2.5:72b-ggml-q4    default                      macos
+
+To start a server with one of the models, use ``openllm serve`` like this:
+
+.. code:: bash
+
+   openllm serve qwen2.5:7b
+
+By default, the server starts at ``http://localhost:3000/``.
+
+Interact with the model server
+------------------------------
+
+With the model server up and running, you can call its APIs in the following ways:
+
+.. tab-set::
+
+    .. tab-item:: CURL
+
+       Send an HTTP request to its ``/generate`` endpoint via CURL:
+
+       .. code-block:: bash
+
+            curl -X 'POST' \
+               'http://localhost:3000/api/generate' \
+               -H 'accept: text/event-stream' \
+               -H 'Content-Type: application/json' \
+               -d '{
+               "prompt": "Tell me something about large language models.",
+               "model": "Qwen/Qwen2.5-7B-Instruct",
+               "max_tokens": 2048,
+               "stop": null
+            }'
+
+    .. tab-item:: Python client
+
+       Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example:
+
+       .. code-block:: python
+
+            from openai import OpenAI
+
+            client = OpenAI(base_url='http://localhost:3000/v1', api_key='na')
+
+            # Use the following func to get the available models
+            # model_list = client.models.list()
+            # print(model_list)
+
+            chat_completion = client.chat.completions.create(
+               model="Qwen/Qwen2.5-7B-Instruct",
+               messages=[
+                  {
+                        "role": "user",
+                        "content": "Tell me something about large language models."
+                  }
+               ],
+               stream=True,
+            )
+            for chunk in chat_completion:
+               print(chunk.choices[0].delta.content or "", end="")
+
+    .. tab-item:: Chat UI
+
+       OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat.
+
+       .. image:: ../../source/assets/qwen-openllm-ui-demo.png
+
+Model repository
+----------------
+
+A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_.
\ No newline at end of file
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 7c11c23c1f0e2d3f371ae04b3a7cd5920eac2e4d..a3fc7a4dbea214a5ac06b3bf714782e0825f4245 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -79,6 +79,7 @@ Join our community by joining our `Discord <https://discord.gg/yPEP2vHTu4>`__ an
    deployment/vllm
    deployment/tgi
    deployment/skypilot
+   deployment/openllm
 
 .. toctree::
    :maxdepth: 2