FastChat推理環境のデプロイ

基本情報#

FastChat は、LLM に基づいた ChatBot のトレーニング、推論、評価に使用されるオープンプラットフォームです。FastChat の主な機能は次のとおりです：

優れた大規模言語モデルのトレーニングと評価のコード。
Web UI と OpenAI 互換の RESTful API を備えた分散マルチモデルサービスシステム。

サポートされているモデルのリスト：


モデル構造	モデル名	実際のモデル ID の例
AquilaForCausalLM	Aquila	BAAI/AquilaChat2-34B, BAAI/Aquila2-34B, など
BaiChuanForCausalLM	Baichuan	baichuan-inc/Baichuan2-7B-Base, baichuan-inc/Baichuan2-13B-Base, など
ChatGLMModel	ChatGLM	ZhipuAI/chatglm2-6b, ZhipuAI/chatglm3-6b, など
InternLMForCausalLM	InternLM	internlm/internlm-7b, internlm/internlm-chat-7b, など
QWenLMHeadModel	Qwen	qwen/Qwen-1_8B-Chat, qwen/Qwen-7B-Chat, qwen/Qwen-14B-Chat，qwen/Qwen-72B-Chat，など
LlamaForCausalLM	LLaMa	Llama-2-7b-ms，Llama-2-13b-ms，Llama-2-70b-ms，など
YiForCausalLM	Yi	01ai/Yi-6B-Chat, 01ai/Yi-34B-Chat, など

環境設定とインストール #

基本イメージのインストールと実行

docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda11.8.0-py310-torch2.1.0-tf2.14.0-1.10.0
docker run -itd --name fastchat -p 7788:8000 -v /nfs/data/models:/models registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda11.8.0-py310-torch2.1.0-tf2.14.0-1.10.0 bash
docker exec -it fastchat bash

FastChat の最新パッケージのインストール

pip3 install "fschat[model_worker,webui]"
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip install .
pip install --upgrade transformers

FastChat を使用してモデルワーカーをデプロイするために、まずコントローラを起動します：

python -m fastchat.serve.controller --host 0.0.0.0 &

モデルワーカーをデプロイするために、千問 1.5 モデルを例にします

python -m fastchat.serve.model_worker --host 0.0.0.0 --worker-address http://0.0.0.0:21002 --controller-address http://0.0.0.0:21001 --model-path /models/Qwen1.5-1.8B-Chat --revision v1.0.0 &

推論 CLI（終了後、py プロセスを終了します）

python -m fastchat.serve.cli --model-path /models/Qwen1.5-1.8B-Chat

推論テスト

python -m fastchat.serve.test_message --model-name Qwen1.5-1.8B-Chat --message 你好

WebUI サービスの起動

python -m fastchat.serve.gradio_web_server --host 0.0.0.0 --port 8000 &

Pasted image 20240308111425

このモデルは 4700MiB の GPU メモリを使用します ^n4z8lb

API サービス#

API サーバーの起動

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --controller-address http://0.0.0.0:21001 --port 8000 &

モデルのリストを取得

curl http://192.168.1.6:7788/v1/models

API 推論テスト

curl http://192.168.1.6:7788/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen1.5-1.8B-Chat",
    "messages": [{"role": "user", "content": "你是谁？你会做什么？"}]
  }'

埋め込みテスト

curl http://192.168.1.6:7788/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen1.5-1.8B-Chat",
    "input": "无代码"
  }'

参考資料#

https://github.com/lm-sys/FastChat
https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md