← 返回首页

大语言模型实战—FastChat部署

分类：AI/LLM | 日期：2023-11-22

什么是 FastChat？

FastChat 是一个开放的平台，用于训练、部署和评估基于大语言模型的对话系统。它支持多种大模型，包括 LLaMA、Vicuna、Alpaca 等。

部署准备

硬件要求

GPU：至少 8GB 显存（推荐 16GB 或更多）
内存：至少 32GB
存储：至少 100GB 可用空间

软件要求

Python 3.8+
CUDA 11.8+
Docker（可选）

安装步骤

1. 克隆项目

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

2. 安装依赖

pip3 install -e .

3. 下载模型

从 Hugging Face 下载所需的模型文件，或者使用 FastChat 提供的脚本自动下载。

启动服务

启动控制器

python3 -m fastchat.serve.controller --host localhost --port 21001

启动模型 Worker

python3 -m fastchat.serve.model_worker --model-path /path/to/model --controller-address http://localhost:21001 --worker-address http://localhost:21002

启动 API 服务器

python3 -m fastchat.serve.gradio_web_server --controller-address http://localhost:21001

使用 FastChat

服务启动后，可以通过浏览器访问 Gradio 界面，或者使用 REST API 进行调用。

总结

FastChat 提供了一个简单易用的方式来部署大语言模型，适合个人或小团队使用。通过合理的配置，可以在有限的硬件资源上运行大模型。