Training an LLM with Hugging Face
Mathew Hemphill
Senior Engineer
Mathew Hemphill
Senior Engineer
Beginners Guide to fine-tuning an LLM
In a previous article, I demonstrated how easy it is to add Natural Language Processing (NLP) to an application using the Hugging Face Transformers library.
In that article, there were some samples of how a Large Language Model (LLM) can be used to answer questions about a particular subject, such as the band AC/DC. In those samples, the knowledge about AC/DC was passed into the model with the question, i.e. as text with the inference prompt.
An alternative approach would be to fine-tune the model, by training it with data about AC/DC. In this way, the knowledge (context) does not need to be passed into the model with every inference.
This article demonstrates how to go about doing this fine-tuning using the Hugging Face Transformers PEFT library.
Prerequisites
You will need Python3 installed, I used v3.13.3. However, any recent version (>3.9) should suffice. Then create a virtual environment and install some dependencies with pip as follows.
Optionally, a
requirements.txtfile has been provided as an appendix, so you can use the exact same versions I used at time of writing.
mkdir my-project
cd my-project
python3 -m venv venv
source venv/bin/activate
pip install transformers
pip install torch
pip install datasets
pip install 'transformers[torch]'
pip install peft
You will NOT need a Hugging Face account to run the sample code provided.
Network connectivity is required to download the base models. However, once the base model is downloaded, the training runs locally, although if you run it without a network, it will log some repeated warnings when it checks to see if the base model has been updated.
Starting Point
We will start with a simple python program that submits a question about AC/DC to an LLM and prints the answer.
Create a file named run-questions.py and paste the following snippet:
from transformers import pipeline
qa = pipeline("text2text-generation", model="google/flan-t5-small")
question = "When was ACDC formed?"
knowledge = """
ACDC is the name of a band that was formed in Sydney in 1973.
The members of the band include Malcolm as the rhythm guitarist and Angus as the lead guitarist.
"""
result = qa("Context: " + knowledge + " Question: " + question)
print(result)
Run it with the command python run-questions.py.
Using the knowledge supplied, it will correctly answer the question “When was AC/DC formed?” with “1973”. However, if you were to remove the lines from the knowledge variable and rerun, the base model is unable to answer the question (accurately).
Fine-Tuning
Rather than passing the knowledge with the question (inference), let’s instead pre-train the model with the knowledge about AC/DC.
Hugging Face provides the PEFT library, where PEFT is an acronym of Parameter Efficient Fine Tuning methods. This is an efficient means to fine-tune a model using additional training data.
Training Data
A dataset of training data is required, and this can be supplied in the form of question and answer pairs in a JSON file named acdc_qa.json.
Note that each line in the file should be a JSON object, and there is no need to encapsulate the lines in a JSON array (nor do you need commas between them).
{"question":"When was ACDC formed?", "answer": "1973"}
{"question":"What year was ACDC formed?", "answer": "1973"}
{"question":"What is the name of the band that was formed in Sydney in 1973?", "answer": "ACDC"}
{"question":"Where was ACDC formed?", "answer": "Sydney"}
{"question":"Who are the members of ACDC?", "answer": "Malcolm Young and Angus Young"}
{"question":"What role does Malcolm play in ACDC?", "answer": "rhythm guitarist"}
{"question":"What role does Angus play in ACDC?", "answer": "lead guitarist"}
I hand wrote the above questions. However, fine-tuning requires more data, and a variety of data. Hence, I asked Github Copilot to generate another 50 questions about AC/DC using the above as a template.
If you don’t have access to Github Copilot, you could try your favourite AI, e.g. ChatGPT or Claude. Having Github Copilot integrated into Visual Studio Code is extremely handy.
Through trial and error, I found that when the training data included more variety, better fine-tuning results were achieved. As such, I again asked Github Copilot to generate another 50 questions about the bands “Cold Chisel” and the “Foo Fighters”. Including these extra questions results in a model that can better answer questions about bands.
In total, the training data set I used contained around 135 question/answer pairs, and that’s likely the bare minimum required to get any decent results.
Training
Create a file named run-trainer.py and paste the following imports:
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import Trainer, AutoModelForSeq2SeqLM
from transformers import Seq2SeqTrainingArguments
from peft import LoraConfig, TaskType, get_peft_model
The first thing the training program must do is to load the training data, and tokenise it in a format suitable for the base model being trained. Hugging Face makes this convenient with the AutoTokenizer class and the load_dataset function:
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
dataset = load_dataset("json", data_files="acdc_qa.json")
def preprocess(example):
inputs = tokenizer(example["question"], max_length=128, truncation=False, padding="max_length")
targets = tokenizer(example["answer"], max_length=128, truncation=False, padding="max_length")
inputs["labels"] = targets["input_ids"]
return inputs
tokenized_dataset = dataset.map(preprocess)
tokenized_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
Next, prepare the model for training with PEFT:
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM,
inference_mode=False,
r=8,
lora_alpha=32,
lora_dropout=0.1
)
model = get_peft_model(model, peft_config)
Then, configure and run the trainer:
training_args = Seq2SeqTrainingArguments(
output_dir="./acdc-finetuned-model",
per_device_train_batch_size=8,
num_train_epochs=100,
logging_steps=1,
push_to_hub=False,
learning_rate=1e-3,
eval_strategy="epoch",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset['train'],
eval_dataset=tokenized_dataset['train'].select(range(20)),
)
trainer.train()
You can experiment with the training arguments, especially the num_train_epochs.
An epoch is one pass over the training data.
I found at least 100 epochs were required before the model would start answering questions with AC/DC facts. I also tried increasing this to 300 and the results were slightly better, in that it could answer more questions accurately. When increasing the epochs, you want the loss being logged by the trainer to be trending downwards. If the loss is no longer trending downwards, there’s no point adding more epochs.
The trainer is provided with both training and evaluation datasets. For the latter, I simply used the first 20 questions from the training dataset. This is acceptable for this use case where the intention is to make the model answer questions in an FAQ style. If the intention is to have a model that can generalise and answer questions it hasn’t seen before, it would be better to have a seperate set of questions for training and evaluation.
And finally, save the fine-tuned model and tokeniser to local disk from where it can be used later.
model.save_pretrained("./acdc-finetuned-model")
tokenizer.save_pretrained("./acdc-finetuned-model")
Run the trainer with the command python run-trainer.py and wait for it to complete. The training runtime is governed by the number of epochs. Allow at least 15–20 minutes for the trainer to complete.
Using the Fine Tuned Model
Modify the run-questions.py file with the following code:
from transformers import pipeline
qa_pipeline = pipeline("text2text-generation", model="./acdc-finetuned-model", tokenizer="./acdc-finetuned-model")
questions = [
"When was ACDC formed?",
"Where was ACDC formed?",
"List the members of Cold Chisel.",
"List the members of ACDC.",
]
for question in questions:
answer = qa_pipeline(question)
print(f"{question} Answer: {answer[0]['generated_text']}")
The above code loads the fine-tuned model and establishes a Hugging Face Transformers pipeline ready to generate answers to questions about AC/DC.
Note there is no longer any knowledge being passed with the questions as context.
As shown below, the fine-tuned model is capable of answering questions about AC/DC (and Cold Chisel).

It only lists 2 members of AC/DC. This is because I only included Malcolm and Angus in the training data!
Wrap Up
This article has demonstrated how an LLM can be fine-tuned about a particular subject, so that it can answer questions about that subject. This avoids the need to pass knowledge with every question, keeping the token count to a minimum. Depending upon where and how you host your LLM, the token count of each inference can contribute to your usage costs. Further, each LLM has an upper limit known as the context window which limits how much data can be passed into a model with an inference, before performance and accuracy starts to decline.
In this case, where the original knowledge was only two lines, there’s not much practical benefit. However, this approach is beneficial when the knowledge is many mega or gigabytes of multiple documents. This trivial example demonstrates the process of fine-tuning, and how the Hugging Face PEFT library makes the process easier.
Further, it indicates how much effort it takes to fine-tune a model, both in preparing the training data, and in terms of processing power and time to run the training. Although the fine-tuned model could answer some questions, it was far from perfect, suggesting more time and effort would be required to fine-tune the fine-tuned model.
Appendix — requirements.txt
Save the following to a text file named requirements.txt and apply by activating your python virtual environment and run pip install -r requirements.txt:
accelerate==1.9.0
aiohappyeyeballs==2.6.1
aiohttp==3.12.15
aiosignal==1.4.0
attrs==25.3.0
certifi==2025.7.14
charset-normalizer==3.4.2
datasets==4.0.0
dill==0.3.8
filelock==3.18.0
frozenlist==1.7.0
fsspec==2025.3.0
hf-xet==1.1.5
huggingface-hub==0.34.3
idna==3.10
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
multidict==6.6.3
multiprocess==0.70.16
networkx==3.5
numpy==2.3.2
packaging==25.0
pandas==2.3.1
peft==0.16.0
propcache==0.3.2
psutil==7.0.0
pyarrow==21.0.0
python-dateutil==2.9.0.post0
pytz==2025.2
PyYAML==6.0.2
regex==2025.7.34
requests==2.32.4
safetensors==0.5.3
setuptools==80.9.0
six==1.17.0
sympy==1.14.0
tokenizers==0.21.4
torch==2.7.1
tqdm==4.67.1
transformers==4.55.0
typing_extensions==4.14.1
tzdata==2025.2
urllib3==2.5.0
xxhash==3.5.0
yarl==1.20.1