diff --git a/README.md b/README.md index b6f99135e600828aa36e6575213f9975e7cdc2ab..26671653976e552d81346c1c8f7b301eb40306a6 100644 --- a/README.md +++ b/README.md @@ -168,7 +168,7 @@ Clone [`llamafile`](https://github.com/Mozilla-Ocho/llamafile), run source insta ## Deployment -Qwen2.5 is supported by multiple inference frameworks. Here we demonstrate the usage of `vLLM` and `SGLang`. +Qwen2.5 is supported by multiple inference frameworks. Here we demonstrate the usage of `vLLM`, `SGLang` and `OpenLLM`. ### vLLM @@ -254,6 +254,16 @@ for m in state.messages(): print(state["answer_1"]) ``` +### OpenLLM + +[OpenLLM](https://github.com/bentoml/OpenLLM) allows you to easily run Qwen2.5 as OpenAI-compatible APIs. You can start a model server using `openllm serve`. For example: + +```bash +openllm serve qwen2.5:7b +``` + +The server is active at `http://localhost:3000/`, providing OpenAI-compatible APIs. You can create an OpenAI client to call its chat API. For more information, refer to [our documentation](https://qwen.readthedocs.io/en/latest/deployment/openllm.html). + ### Tool Use For tool use capabilities, we recommend taking a look at [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent), which provides a wrapper around these APIs to support tool use or function calling. diff --git a/docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po b/docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po new file mode 100644 index 0000000000000000000000000000000000000000..0a20756441ae8fc08d3cbbfa26ec2b1fb020effd --- /dev/null +++ b/docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po @@ -0,0 +1,105 @@ +# SOME DESCRIPTIVE TITLE. +# Copyright (C) 2024, Qwen Team +# This file is distributed under the same license as the Qwen package. +# FIRST AUTHOR <EMAIL@ADDRESS>, 2024. +# +#, fuzzy +msgid "" +msgstr "" +"Project-Id-Version: Qwen \n" +"Report-Msgid-Bugs-To: \n" +"POT-Creation-Date: 2024-10-21 10:15+0800\n" +"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" +"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" +"Language: zh_CN\n" +"Language-Team: zh_CN <LL@li.org>\n" +"Plural-Forms: nplurals=1; plural=0;\n" +"MIME-Version: 1.0\n" +"Content-Type: text/plain; charset=utf-8\n" +"Content-Transfer-Encoding: 8bit\n" +"Generated-By: Babel 2.12.1\n" + +#: ../../source/deployment/openllm.rst:2 9f00d55ade714c18b2164e565a34ca40 +msgid "OpenLLM" +msgstr "OpenLLM" + +#: ../../source/deployment/openllm.rst:4 0d9f8589af474e3ba036df93871a9d4c +msgid "OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository <https://github.com/bentoml/OpenLLM/>`_ to learn more." +msgstr "OpenLLM å…许开å‘者通过一个命令è¿è¡Œä¸åŒå¤§å°çš„ Qwen2.5 模型,æä¾› OpenAI 兼容的 API。它具有内置的èŠå¤© UI,先进的推ç†åŽç«¯ï¼Œä»¥åŠç®€åŒ–的工作æµç¨‹æ¥ä½¿ç”¨ Qwen2.5 创建ä¼ä¸šçº§äº‘部署。访问 `OpenLLM 仓库 <https://github.com/bentoml/OpenLLM/>`_ 了解更多信æ¯ã€‚" + +#: ../../source/deployment/openllm.rst:7 c2d3e721de5d4a42b42654aaeec00fb9 +msgid "Installation" +msgstr "安装" + +#: ../../source/deployment/openllm.rst:9 258e40bb2c4d4b0b8a7ab002111f2ff7 +msgid "Install OpenLLM using ``pip``." +msgstr "使用 ``pip`` 安装 OpenLLM。" + +#: ../../source/deployment/openllm.rst:15 51e9e7ea8fd14537b9573e1adfd6358a +msgid "Verify the installation and display the help information:" +msgstr "验è¯å®‰è£…并显示帮助信æ¯ï¼š" + +#: ../../source/deployment/openllm.rst:22 854205ed7929464ca8aa4138b5b03f46 +msgid "Quickstart" +msgstr "快速开始" + +#: ../../source/deployment/openllm.rst:24 cd1086ff08b5402dbfdd3721b4454a7b +msgid "Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository." +msgstr "在è¿è¡Œä»»ä½• Qwen2.5 模型之å‰ï¼Œç¡®ä¿æ‚¨çš„æ¨¡åž‹ä»“库与 OpenLLM çš„æœ€æ–°å®˜æ–¹ä»“åº“åŒæ¥ã€‚" + +#: ../../source/deployment/openllm.rst:30 e48b1b0b3ef24ebf8344f1e8e7128026 +msgid "List the supported Qwen2.5 models:" +msgstr "列出支æŒçš„ Qwen2.5 模型:" + +#: ../../source/deployment/openllm.rst:36 59f91e90b1264a749c6182f1e8da9070 +msgid "The results also display the required GPU resources and supported platforms:" +msgstr "结果还会显示所需的 GPU 资æºå’Œæ”¯æŒçš„å¹³å°ï¼š" + +#: ../../source/deployment/openllm.rst:54 5dcf3a784bd042448f0a122818c52395 +msgid "To start a server with one of the models, use ``openllm serve`` like this:" +msgstr "è¦ä½¿ç”¨å…¶ä¸ä¸€ä¸ªæ¨¡åž‹æ¥å¯åЍæœåŠ¡å™¨ï¼Œè¯·ä½¿ç”¨ ``openllm serve`` 命令,例如:" + +#: ../../source/deployment/openllm.rst:60 e637a01782a041cbbb044ed0c5bfb48f +msgid "By default, the server starts at ``http://localhost:3000/``." +msgstr "默认情况下,æœåС噍å¯åŠ¨åœ¨ http://localhost:3000/。" + +#: ../../source/deployment/openllm.rst:63 8457ac65af3f4ec5853f0b8a0a725708 +msgid "Interact with the model server" +msgstr "与模型æœåŠ¡å™¨äº¤äº’" + +#: ../../source/deployment/openllm.rst:65 6f0aa0deae1a4de0b694d35bbef3c5c7 +msgid "With the model server up and running, you can call its APIs in the following ways:" +msgstr "æœåС噍è¿è¡ŒåŽï¼Œå¯ä»¥é€šè¿‡ä»¥ä¸‹æ–¹å¼è°ƒç”¨å…¶ API:" + +#: ../../source/deployment/openllm.rst 80e56ac95bab4c478bef739c854b17a2 +msgid "CURL" +msgstr "CURL" + +#: ../../source/deployment/openllm.rst:71 781c1253b9f8482fb3371b66819f4273 +msgid "Send an HTTP request to its ``/generate`` endpoint via CURL:" +msgstr "通过 CURL å‘å…¶ ``/generate`` 端点å‘é€ HTTP 请求:" + +#: ../../source/deployment/openllm.rst cec41082c62447d2b78b8f6a59305d9e +msgid "Python client" +msgstr "Python 客户端" + +#: ../../source/deployment/openllm.rst:88 3c1852f6303c40fbb7c7f1ded4bf2313 +msgid "Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example:" +msgstr "ä½¿ç”¨æ”¯æŒ OpenAI API å议的框架和工具æ¥è°ƒç”¨ã€‚例如:" + +#: ../../source/deployment/openllm.rst 4b9859905ccf4198960623fb08729f48 +msgid "Chat UI" +msgstr "èŠå¤© UI" + +#: ../../source/deployment/openllm.rst:115 2bd4d8290f0546c3b22ca985ecfa9d13 +msgid "OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat." +msgstr "OpenLLM 为 LLM æœåС噍æä¾›çš„èŠå¤© UI ä½äºŽ ``/chat`` 端点,地å€ä¸º http://localhost:3000/chat。" + +#: ../../source/deployment/openllm.rst:120 57ce35b171cd42088ed23be0862611b3 +msgid "Model repository" +msgstr "模型仓库" + +#: ../../source/deployment/openllm.rst:122 3a642d5e0c4c4a7484d05eebfdf0a3ac +msgid "A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_." +msgstr "OpenLLM ä¸çš„æ¨¡åž‹ä»“库表示å¯ç”¨çš„ LLM 目录。您å¯ä»¥ä¸º OpenLLM æ·»åŠ è‡ªå®šä¹‰çš„ Qwen2.5 模型仓库,以满足您的特定需求。请å‚阅 `我们的文档 <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_ 了解详细信æ¯ã€‚" + diff --git a/docs/source/assets/qwen-openllm-ui-demo.png b/docs/source/assets/qwen-openllm-ui-demo.png new file mode 100644 index 0000000000000000000000000000000000000000..f887cde12cfe48c594944cc0066d23954088f457 Binary files /dev/null and b/docs/source/assets/qwen-openllm-ui-demo.png differ diff --git a/docs/source/deployment/openllm.rst b/docs/source/deployment/openllm.rst new file mode 100644 index 0000000000000000000000000000000000000000..b505cdac3c818f26d2eab9560b9d590ba27202f3 --- /dev/null +++ b/docs/source/deployment/openllm.rst @@ -0,0 +1,122 @@ +OpenLLM +======= + +OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository <https://github.com/bentoml/OpenLLM/>`_ to learn more. + +Installation +------------ + +Install OpenLLM using ``pip``. + +.. code:: bash + + pip install openllm + +Verify the installation and display the help information: + +.. code:: bash + + openllm --help + +Quickstart +---------- + +Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository. + +.. code:: bash + + openllm repo update + +List the supported Qwen2.5 models: + +.. code:: bash + + openllm model list --tag qwen2.5 + +The results also display the required GPU resources and supported platforms: + +.. code:: bash + + model version repo required GPU RAM platforms + ------- --------------------- ------- ------------------ ----------- + qwen2.5 qwen2.5:0.5b default 12G linux + qwen2.5:1.5b default 12G linux + qwen2.5:3b default 12G linux + qwen2.5:7b default 24G linux + qwen2.5:14b default 80G linux + qwen2.5:14b-ggml-q4 default macos + qwen2.5:14b-ggml-q8 default macos + qwen2.5:32b default 80G linux + qwen2.5:32b-ggml-fp16 default macos + qwen2.5:72b default 80Gx2 linux + qwen2.5:72b-ggml-q4 default macos + +To start a server with one of the models, use ``openllm serve`` like this: + +.. code:: bash + + openllm serve qwen2.5:7b + +By default, the server starts at ``http://localhost:3000/``. + +Interact with the model server +------------------------------ + +With the model server up and running, you can call its APIs in the following ways: + +.. tab-set:: + + .. tab-item:: CURL + + Send an HTTP request to its ``/generate`` endpoint via CURL: + + .. code-block:: bash + + curl -X 'POST' \ + 'http://localhost:3000/api/generate' \ + -H 'accept: text/event-stream' \ + -H 'Content-Type: application/json' \ + -d '{ + "prompt": "Tell me something about large language models.", + "model": "Qwen/Qwen2.5-7B-Instruct", + "max_tokens": 2048, + "stop": null + }' + + .. tab-item:: Python client + + Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example: + + .. code-block:: python + + from openai import OpenAI + + client = OpenAI(base_url='http://localhost:3000/v1', api_key='na') + + # Use the following func to get the available models + # model_list = client.models.list() + # print(model_list) + + chat_completion = client.chat.completions.create( + model="Qwen/Qwen2.5-7B-Instruct", + messages=[ + { + "role": "user", + "content": "Tell me something about large language models." + } + ], + stream=True, + ) + for chunk in chat_completion: + print(chunk.choices[0].delta.content or "", end="") + + .. tab-item:: Chat UI + + OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat. + + .. image:: ../../source/assets/qwen-openllm-ui-demo.png + +Model repository +---------------- + +A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_. \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 7c11c23c1f0e2d3f371ae04b3a7cd5920eac2e4d..a3fc7a4dbea214a5ac06b3bf714782e0825f4245 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -79,6 +79,7 @@ Join our community by joining our `Discord <https://discord.gg/yPEP2vHTu4>`__ an deployment/vllm deployment/tgi deployment/skypilot + deployment/openllm .. toctree:: :maxdepth: 2