# Fine-tuning Falcon 7B/40B LLM

Falcon is a family of open-source large language models (LLMs) with 7 billion and 40 billion parameters trained on one trillion tokens.

We can fine-tune Falcon on Q Blocks cloud by running these commands for installation and execution:

GPU configuration:

* We would recommend choosing a 40GB or higher GPU such as 1x A100 40GB/80GB, 1x A6000 or 2x A100 80GB from the Data center nodes option on [Q Blocks platform](https://www.qblocks.cloud/client/v2/create-instance) while launching a GPU instance.

**Install miniconda**

```bash
# Download latest miniconda.
wget -nc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Install. -b is used to skip prompt
bash Miniconda3-latest-Linux-x86_64.sh -b

# Activate.
eval "$(/home/qblocks/miniconda3/bin/conda shell.bash hook)"

# (optional) Add activation cmd to bashrc so you don't have to run the above every time.
printf '\neval "$(/home/qblocks/miniconda3/bin/conda shell.bash hook)"' >> ~/.bashrc
```

**Setup env**

Install using the yaml file:

```bash
# Create and activate env. -y skips confirmation prompt.
conda create -n falcon-env python=3.9 -y
conda activate falcon-env

# newest torch with cuda 11.8
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install other dependencies
pip install -U accelerate einops sentencepiece git+https://github.com/huggingface/transformers.git && \
pip install -U trl git+https://github.com/huggingface/peft.git && \
pip install scipy datasets bitsandbytes wandb
```

**Start the run**

Download script and execute it in conda environment:

<pre class="language-bash"><code class="lang-bash"># Download finetuning script
<strong>wget https://qbcontent.nyc3.cdn.digitaloceanspaces.com/finetuning/finetune-falcon.py
</strong>
eval "$(/home/qblocks/miniconda3/bin/conda shell.bash hook)"
conda activate falcon-env

# Single GPU, falcon 7B, 4bit quantization
torchrun --nnodes 1 --nproc_per_node 1 \
ft.py \
-m ybelkada/falcon-7b-sharded-bf16 \
-q 4bit

# 8x GPUs, falcon 40B, 8bit quantization
torchrun --nnodes 1 --nproc_per_node 2 \
finetune-falcon.py \
-m tiiuae/falcon-40b \
-q 4bit
</code></pre>

More parameters can be specified such as:

`--dataset_name` `--steps` `--batch_size_per_device`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qblocks.cloud/fine-tuning-falcon-7b-40b-llm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
