Using LoRA adapters

Workers AI supports fine-tuned inference with adapters trained with Low-Rank Adaptation ↗. This feature is in open beta and free during this period.

Limitations

We only support LoRAs for the following models (must not be quantized):
- @cf/meta/llama-3.2-11b-vision-instruct
- @cf/meta/llama-3.3-70b-instruct-fp8-fast
- @cf/meta/llama-guard-3-8b
- @cf/meta/llama-3.1-8b-instruct-fast (soon)
- @cf/deepseek-ai/deepseek-r1-distill-qwen-32b
- @cf/qwen/qwen2.5-coder-32b-instruct
- @cf/qwen/qwq-32b
- @cf/mistralai/mistral-small-3.1-24b-instruct
- @cf/google/gemma-3-12b-it
Adapter must be trained with rank r <=8 as well as larger ranks if up to 32. You can check the rank of a pre-trained LoRA adapter through the adapter's config.json file
LoRA adapter file must be < 300MB
LoRA adapter files must be named adapter_config.json and adapter_model.safetensors exactly
You can test up to 30 LoRA adapters per account

Choosing compatible LoRA adapters

Finding open-source LoRA adapters

We have started a Hugging Face Collection ↗ that lists a few LoRA adapters that are compatible with Workers AI. Generally, any LoRA adapter that fits our limitations above should work.

Training your own LoRA adapters

To train your own LoRA adapter, follow the tutorial.

Uploading LoRA adapters

In order to run inference with LoRAs on Workers AI, you'll need to create a new fine tune on your account and upload your adapter files. You should have a adapter_model.safetensors file with model weights and adapter_config.json with your config information. Note that we only accept adapter files in these types.

Right now, you can't edit a fine tune's asset files after you upload it. We will support this soon, but for now you will need to create a new fine tune and upload files again if you would like to use a new LoRA.

Before you upload your LoRA adapter, you'll need to edit your adapter_config.json file to include model_type as one of mistral, gemma or llama like below.

{
  "alpha_pattern": {},
  "auto_mapping": null,
  ...
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM",
  "model_type": "mistral",
}

Wrangler

You can create a finetune and upload your LoRA adapter via wrangler with the following commands:

npx wrangler ai finetune create <model_name> <finetune_name> <folder_path>
#🌀 Creating new finetune "test-lora" for model "@cf/mistral/mistral-7b-instruct-v0.2-lora"...
#🌀 Uploading file "/Users/abcd/Downloads/adapter_config.json" to "test-lora"...
#🌀 Uploading file "/Users/abcd/Downloads/adapter_model.safetensors" to "test-lora"...
#✅ Assets uploaded, finetune "test-lora" is ready to use.

npx wrangler ai finetune list
┌──────────────────────────────────────┬─────────────────┬─────────────┐
│ finetune_id                          │ name            │ description │
├──────────────────────────────────────┼─────────────────┼─────────────┤
│ 00000000-0000-0000-0000-000000000000 │ test-lora       │             │
└──────────────────────────────────────┴─────────────────┴─────────────┘

REST API

Alternatively, you can use our REST API to create a finetune and upload your adapter files. You will need a Cloudflare API Token with Workers AI: Edit permissions to make calls to our REST API, which you can generate via the Cloudflare Dashboard.

Creating a fine-tune on your account

## Input: user-defined name of fine tune
## Output: unique finetune_id

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
    -H "Authorization: Bearer {API_TOKEN}" \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "SUPPORTED_MODEL_NAME",
      "name": "FINETUNE_NAME",
      "description": "OPTIONAL_DESCRIPTION"
    }'

Uploading your adapter weights and config

You have to call the upload endpoint each time you want to upload a new file, so you usually run this once for adapter_model.safetensors and once for adapter_config.json. Make sure you include the @ before your path to files.

You can either use the finetune name or id that you used when you created the fine tune.

## Input: finetune_id, adapter_model.safetensors, then adapter_config.json
## Output: success true/false

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/{FINETUNE_ID}/finetune-assets/ \
    -H 'Authorization: Bearer {API_TOKEN}' \
    -H 'Content-Type: multipart/form-data' \
    -F 'file_name=adapter_model.safetensors' \
    -F 'file=@{PATH/TO/adapter_model.safetensors}'

List fine-tunes in your account

You can call this method to confirm what fine-tunes you have created in your account

curl
json output

## Input: n/a
## Output: success true/false

curl -X GET https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
    -H 'Authorization: Bearer {API_TOKEN}'

# Example output JSON
{
  "success": true,
  "result": [
    [{
       "id": "00000000-0000-0000-0000-000000000",
       "model": "@cf/meta-llama/llama-2-7b-chat-hf-lora",
       "name": "llama2-finetune",
       "description": "test"
    },
    {
       "id": "00000000-0000-0000-0000-000000000",
       "model": "@cf/mistralai/mistral-7b-instruct-v0.2-lora",
       "name": "mistral-finetune",
       "description": "test"
    }]
  ]
}

Running inference with LoRAs

To make inference requests and apply the LoRA adapter, you will need your model and finetune name or id. You should use the chat template that your LoRA was trained on, but you can try running it with raw: true and the messages template like below.

workers ai sdk
rest api

const response = await env.AI.run(
  "@cf/mistralai/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
  {
    messages: [{ role: "user", content: "Hello world" }],
    raw: true, //skip applying the default chat template
    lora: "00000000-0000-0000-0000-000000000", //the finetune id OR name
  },
);

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/mistral/mistral-7b-instruct-v0.2-lora \
  -H 'Authorization: Bearer {API_TOKEN}' \
  -d '{
    "messages": [{"role": "user", "content": "Hello world"}],
    "raw": "true",
    "lora": "00000000-0000-0000-0000-000000000"
  }'

Was this helpful?

Community
X
Discord
YouTube
GitHub