Skip to main content

CODE - IA - Python - How to convert PyTorch `.pth` tensors to the gguf format

artificial intelligence article, How to convert PyTorch `.pth` tensors to the SafeTensors format
GGUF stands for "GPT-Generated Unified Format." It's a file format specifically designed for efficiently storing and loading large language models (LLMs).

The format was developed as an evolution of the earlier GGML format, created by Georgi Gerganov (hence the "GG" prefix) for his llama.cpp project. GGUF was introduced in 2023 as an improved replacement for GGML.

Key features of the GGUF format include:

Efficient quantization support - allowing models to be compressed to smaller bit-widths (like 4-bit or 8-bit) while maintaining reasonable performance
Unified metadata storage - embedding important model information directly in the file
Better versioning and compatibility across different implementations
Improved memory mapping capabilities for faster loading
GGUF has become particularly popular in the open-source AI community because it allows running large language models on consumer hardware with reasonable performance. It's the primary format used by projects like llama.cpp, which enable running LLMs locally on personal computers rather than requiring cloud infrastructure.

```python
import torch
import numpy as np
import gguf
from transformers import AutoConfig
```
- Imports the PyTorch library for deep learning and model handling
- Imports NumPy for numerical operations and array manipulation
- Imports the GGUF library which handles the target format conversion
- Imports AutoConfig from Hugging Face's transformers library to load model configurations

```python
# 1. Load your PyTorch model weights
model_path = "your_model.pth"
model_weights = torch.load(model_path, map_location="cpu")
```
- Sets `model_path` to the location of your PyTorch model file (you'd replace "your_model.pth" with your actual file path)
- Loads the model weights from that path using PyTorch's load function
- `map_location="cpu"` ensures the weights are loaded into CPU memory, even if they were saved from a GPU

```python
# 2. Get model configuration
config = AutoConfig.from_pretrained("path_to_config_or_model_dir")
```
- Creates a configuration object by loading from either a config file or model directory
- This contains essential parameters like vocabulary size, model dimensions, etc.
- You'd replace "path_to_config_or_model_dir" with your actual config path

```python
# 3. Create a GGUF builder
builder = gguf.GGUFBuilder("output_model.gguf")
```
- Initializes a GGUF builder object that will construct the output file
- "output_model.gguf" is the filename for the converted model

```python
# 4. Add metadata
builder.add_architecture("llama")
builder.add_vocab_size(config.vocab_size)
builder.add_context_length(config.max_position_embeddings)
builder.add_embedding_length(config.hidden_size)
```
- Adds the architecture type as "llama" (could be different for other model types)
- Adds the vocabulary size from the config (number of tokens the model knows)
- Adds the maximum context length (how many tokens the model can process at once)
- Adds the embedding dimension (the size of the vector representing each token)

```python
# 5. Convert and add tensor weights
for name, param in model_weights.items():
    # Convert PyTorch tensor to numpy array
    tensor = param.cpu().numpy()
    # Add to GGUF file
    builder.add_tensor(name, tensor)
```
- Loops through each named parameter in the model weights dictionary
- For each parameter:
  - Ensures it's on CPU with `.cpu()`
  - Converts it to a NumPy array with `.numpy()`
  - Adds the tensor to the GGUF file with the same name

```python
# 6. Write the GGUF file
builder.write_file()
```
- Finalizes the process by writing all the collected data to the output GGUF file
- This creates the final file that can be used with GGUF-compatible inference engines like llama.cpp

This conversion process preserves the model architecture and weights while transforming them into a format optimized for efficient inference on various devices.