llama.cpp的安装与模型格式转换

说明

Llama.cpp可以进行大模型的部署、推理、量化和格式转换。
本文档主要使用llama.cpp进行模型文件类型的转换并导入到ollama中使用。

1.下载及安装llama.cpp

下载

github下载地址：https://github.com/ggml-org/llama.cpp
你也可以在网盘中下载压缩包: https://pan.quark.cn/s/4b66f7f7ed69

安装

你需要先安装WSL和conda，相应教程如下：

安装wsl2: https://www.eogee.com/article/detail/15#1.设置windows环境
安装conda: https://www.eogee.com/article/detail/16#3.%20安装conda

安装完成后，按照以下步骤安装llama.cpp：

下载并解压压缩包

进入解压后的文件夹，打开wsl终端，依次输入以下命令：

  conda create -n llama-cpp python=3.10 # 创建llama-cpp的conda环境  conda activate llama-cpp # 激活llama-cpp的conda环境  pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple # 安装依赖并指定镜像源

2.格式转换并量化

safasentors格式模型下载

下载或准备一个hf的safasentors格式的模型文件。

在hugginface.co、hf-mirror.com、modelscope.com等网站下载模型文件。
也可以在网盘中下载：https://pan.quark.cn/s/e99e63bc33e7

执行转换

在wsl命令行中执行如下命令：

python convert_hf_to_gguf.py ../../models/Qwen3-0.6B --outtype bf16 --outfile ../../models/Qwen3-0.6B_bf16.gguf

../../models/Qwen3-0.6B 是hf的safasentors格式的模型文件的地址，其中../表示相对本目录的上一级目录
--outtype bf16 表示输出模型的类型为bf16，可选类型为’f32’, ‘f16’, ‘bf16’, ‘q8_0’, ‘tq1_0’, ‘tq2_0’, ‘auto’。如何确定，可以查看Qwen3-0.6B/config.json中 "torch_dtype": "bfloat16"，表示该模型为bf16。
--outfile ../../models/Qwen3-0.6B_bf16.gguf 表示输出模型的地址及名称。

3.导入至ollama

创建Modelfile文件

在转换后的模型文件所在目录下，创建一个Modelfile文件，文件名为Modelfile，没有后缀，文件内容如下：

FROM ./Qwen3-0.6B_bf16.ggufTEMPLATE """{{ if .System }}<|im_start|>system{{ .System }}<|im_end|>{{ end }}{{ range .Messages }}{{ if eq .Role "user" }}<|im_start|>user{{ .Content }}<|im_end|><|im_start|>assistant{{ else if eq .Role "assistant" }}{{ .Content }}<|im_end|>{{ end }}{{ end }}"""

FROM ./Qwen3-0.6B_bf16.gguf 表示导入的模型文件地址。./表示相对本目录的地址。
TEMPLATE 表示模板内容，模板内容可以自定义，ollama会根据模板内容生成对话。llama.cpp默认使用的模板会出现错乱现象。
在ollama中导入该模型

导入模型

在该目录下运行cmd命令行：

ollama create Qwen3-0.6B -f Modelfile

ollama create Qwen3-0.6B 表示创建ollama模型，名称为Qwen3-0.6B。
-f Modelfile 表示导入的Modelfile文件。

#行业前沿 #技术交流

#llama

llama.cpp的安装与模型格式转换

http://localhost:8090/archives/llama.cppde-an-zhuang-yu-mo-xing-ge-shi-zhuan-huan

作者

Administrator

发布于

2025年05月06日

更新于

2025年05月06日

许可协议

RagFlow的部署并基于本地大模型问答系统上一篇

本地部署LLM 下一篇