Using LLMs with Hugging Face

Mathew Hemphill

Senior Engineer

September 18, 2025

Beginners Guide to Open Source Natural Language Processing

Hugging Face is an open source platform, where a community of engineers can collaborate on large language models (LLMs), machine learning datasets and applications that make use of the LLMs.

Unlike competitors that provide proprietary models under commercial licenses, the focus is on open source models that tend to be smaller, and can be run locally.

Hugging Face also provides a hosting service where you can pay for the compute used to run queries against your LLMs. Applications interact with your LLM via (inference) endpoints, just like they would call any other API.

This article focuses upon the open source libraries maintained by Hugging Face. This includes a Transformers library that makes it easy to interact with LLMs. It will demonstrate how little code is required to add Natural Language Processing (NLP) functionality to an application.

Being more of a software developer than a data scientist, I have chosen to explore the Transformers.js Javascript library. The equivalent code samples shown could be created in Python as well, and towards the end of the article I will point out some reasons why you might be better off using the Python library.

Prerequisites

The default LLM models will be used, so you do NOT require a Hugging Face account nor an (API) access token to run these samples.

The documentation states the Transformers run directly in the browser, and therefore there’s no need for a server. This is true, but you can still run them in a Node server, and this article will demonstrate some simple Node programs that use the Transformers.

Therefore you need Node installed. I used v22.14.0 however the latest LTS should be fine.

Initialise a new project and install Hugging Face with the following commands:

mkdir my-project
cd my-project
npm init
npm i @huggingface/transformers

Ensure your package.json contains "type": "module" to use ECMAScript modules.

Pipeline API

The simplest way to interact with an LLM locally is via the Pipeline API.

Import the pipeline:

import { pipeline } from '@huggingface/transformers';

Then instantiate a pipeline specifying a Task and optionally a specific model (if you don’t a default model will be used):

const sentimentAnalysis = await pipeline('sentiment-analysis');

Then invoke the pipeline, passing it inputs and it will return the output:

const out = await sentimentAnalysis('The fish was delicious however the service could have been better.');

There are many tasks supported by the Pipeline API. This article will demonstrate the following:

  • Sentiment-analysis — as seen above, this accepts a string as input and classifies the input as being either positive, negative or neutral.
  • Zero-shot-classification — similar to sentiment analysis in that the input is classified by the model, however you can provide a list of classifications (labels) without needing to provide examples of each.
  • Text2text-generation — generates text output from text input
  • Translation — translates text input to text output in a specified language,
  • Question-answering — given some context data and a question, retrieves the answer to the question from the context.

This article includes working samples of each of these Tasks. Refer to the documentation for a full list of Tasks. It’s worth pointing out there is support for multimodal models, i.e. text, images and audio.

Note that different models are optimised for specific tasks, for example BERT for text classification and bertweet-base-sentiment for sentiment analysis. So factor in the task being performed when choosing a model.

Samples

The sample programs that follow will download the LLM to the local filesystem. The first run will take quite a few minutes as the models are between 2–3GB in size. Subsequent runs are much faster once the models are cached on the filesystem.

Sample Zero Shot Classification

Zero Shot Classification is similar to sentiment analysis in that the input is classified by the model, however you can provide a list of classifications (labels) without needing to provide examples of each.

Create a file named run-classifier.js and paste the following snippet:

import { pipeline } from '@huggingface/transformers';

async function runClassifier() {
const classifier = await pipeline('zero-shot-classification', 'Xenova/nli-deberta-v3-xsmall')
const classes = ['technical support', 'complaint', 'inquiry', 'billing'];
const result = await classifier('My television is not working, and I need to organise a repair.', classes);
console.log(JSON.stringify(result));

const result2 = await classifier('My credit card recently expired and my subscription is due to be paid soon.', classes);
console.log(JSON.stringify(result2));
}

await runClassifier();

Run by entering node run-classifier.js into a terminal. The output (shown formatted below) is a list of all the available classes plus a score for each, where a higher score means the model believes the class/label to be a better fit for the input text. In the output, the labels are sorted in descending order, where the most preferred class/label is listed first.

{
"sequence": "My television is not working, and I need to organise a repair.",
"labels": [
"technical support",
"inquiry",
"complaint",
"billing"
],
"scores": [
0.7734760213837237,
0.11493138081071165,
0.09830344578778148,
0.013289152017783159
]
}

{
"sequence": "My credit card recently expired and my subscription is due to be paid soon.",
"labels": [
"billing",
"complaint",
"inquiry",
"technical support"
],
"scores": [
0.8633786926394368,
0.08957753346219464,
0.04327964997070977,
0.0037641239276587383
]
}

Sample Text Generator

This example generates text output from text input.

Create a file named run-generator.js and paste the following snippet:

import { pipeline } from '@huggingface/transformers';

async function runGenerator() {
const generator = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M');
const result = await generator('Write a haiku about large learning models.', {
max_new_tokens: 200,
temperature: 0,
repetition_penalty: 2.0,
no_repeat_ngram_size: 3,
});

console.log(result[0].generated_text);
}

await runGenerator();

Run by entering node run-generator.js into a terminal. It will generate a one line Haiku about large learning models (example below). You can alter the input to the generator to make it generate other more exciting things.

Large learning models, Deeply understanding data's patterns.

Sample Translator

This example translates text input to text output in a specified language.

Create a file named run-translator.js and paste the following:

import { pipeline } from '@huggingface/transformers';

async function runTranslator() {
const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');

const targetLanguage = "deu_Latn";
const english = 'These Pretzels are making me thirsty.';
const translated = await translator(english, {
src_lang: 'eng_Latn',
tgt_lang: targetLanguage,
});

const backToEnglish = await translator(translated[0].translation_text, {
src_lang: targetLanguage,
tgt_lang: 'eng_Latn'
});

console.log(`Original: ${english}`);
console.log(`Translated: ${translated[0].translation_text}`);
console.log(`Translated back to English: ${backToEnglish[0].translation_text}`);
}

await runTranslator();

Run by entering node run-translator.js into a terminal. It will output the following:

Original: These Pretzels are making me thirsty.
Translated: Diese Pretzels machen mich durstig.
Translated back to English: These brats make me thirsty.

It has successfully translated the English phrase “These Pretzels are making me thirsty” to German. However, upon translation back to English, the word Pretzel has been lost in translation, replaced with brats 🤷‍♂ (when I used French, the translation worked full circle).

Sample Question Answering

In this example, given some context data and a question, the answer is retrieved from the context.

Create a file named knowledge.txt and paste the following brief (and incomplete) blurb about ACDC:

ACDC is the name of a band that was formed in Sydney in 1973.
The members of the band include Malcolm as the rhythm guitarist and Angus as the lead guitarist.

Create a file named run-questions.js and paste the following:

import { pipeline } from '@huggingface/transformers';
import fs from 'fs';

async function runQuestions() {
const knowledge = fs.readFileSync('knowledge.txt', 'utf8');

const qa = await pipeline('question-answering', 'Xenova/distilbert-base-cased-distilled-squad');

const question = 'What is the name of the band that was formed in Sydney in 1973?';
const result = await qa(question, knowledge);

console.log(`Question: ${question}`);
console.log(`Answer: ${result.answer} (Score: ${result.score})`);
}

await runQuestions();

Run by entering node run-questions.js into a terminal.

It will correctly answer ACDC in response to the question “What is the name of the band that was formed in Sydney in 1973”.

Try modifying the question variable to ask it various questions about ACDC or about random topics. The question-answering task is fine when you ask it something where the answer is literally contained in the supplied knowledge base. If you ask it something that is not literally contained in the knowledge, it will return incorrect answers.

For example, if you were to ask it: “How many members are in ACDC?”, it can’t derive the number from the knowledge provided. Instead, it will return what it determines to be the most relevant excerpt from the knowledge, e.g. “Malcolm as the rhythm guitarist and Angus as the lead guitarist”. However, the text2text-generator task can also be used to answer questions about a knowledge base as demonstrated in the next section.

The question-answering task returns a score with the answer. You can code into your application a minimum confidence threshold for any answers returned. You will need to use trial and error to establish a suitable threshold. This can be quite difficult in practise as the model tends to be confident in the answer it returns, even though from the user’s perspective that is probably not the answer they were looking for.

Sample Question Answering with Generator

You will require the knowledge.txt file used by the previous sample.

Create a file named run-questions-with-generator.js and paste the following:

import { pipeline } from '@huggingface/transformers';
import fs from 'fs';

async function runQuestionsWithGenerator() {
const knowledge = fs.readFileSync('knowledge.txt', 'utf8');

const generator = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M');
const question = 'Who is the lead guitarist of ACDC?';
const result = await generator(`Context: ${knowledge} Question: ${question}`, {
max_new_tokens: 200,
temperature: 0,
repetition_penalty: 2.0,
no_repeat_ngram_size: 3,
});

console.log(result[0].generated_text);
}

await runQuestionsWithGenerator();

Run by entering node run-questions-with-generator.js into a terminal.

It will correctly answer the question shown with “Angus is the lead guitarist of ACDC”. Further, if you modify the question variable to “How many members in ACDC?”, it will correctly answer “There are two members in ACDC”, derived from the knowledge provided.

Wrap Up

The samples provided demonstrate how little code it takes to start using AI and Natural Language Processing (NLP) in Node programs when you use the Hugging Face libraries. Having the option of a Javascript library is an advantage where the application developers are highly skilled in Node, Typescript and/or Javascript, and not skilled in Python. It also means the application architecture can consist entirely of Node without the need for a Python based component/layer to perform the NLP inferences.

Listed in the next section are some other things to consider before adopting Hugging Face open source AI for your applications, as well as some reasons to use the Python library rather than Javascript.

The samples all use the default models which are freely available as open source. There are more advanced models available however some of these you must pay to use. Others are still free to use, but require you to accept the terms of usage. For both situations to use these models you will need to create a Hugging Face account. Then you generate an access token in your account, and your programs log into Hugging Face using this token. The same approach must be used if the model being used is hosted remotely on Hugging Face’s infrastructure.

Many models are available. As of September 2025 (the time of writing), the Hugging Face site indicates more than 1.8 million models are available. The website provides a selector tool that lets you filter by criteria. When filtering for those compatible with Transformers.js the number of models drops to just over 2K.

You can also filter by the number of parameters which roughly equates to the size of the model, and generally newer more capable models are being trained with ever increasing number of parameters. The more parameters, the more compute you will need when using the model.

A large part of your planning stage must be devoted to model selection based upon requirements, as well as cost and available computing power wherever your application is to be hosted.

Model usage is subject to software licenses, just like any open source software. It is important to check the license and terms of use before deciding to use a model. The license is visible on the Hugging Face website, displayed against each model’s details.

Why you might want to use Python

Both the question/answer samples made use of a knowledge base passed into the models with each question. The only way to provide knowledge to the models is by passing it as text input (context). This is fine for small amounts of data, but would not be suitable for large libraries of knowledge. As with all models, there is an upper limit known as the “context window” — which is the most data you can pass into the model before performance starts to decline. There are techniques to address this constraint, such as:

  • Breaking the knowledge into (smaller) chunks, and invoking the model against each chunk and then determining the best answer.
  • Identifying the relevant documents and chunks with semantic searches and only passing the most relevant chunks to the model.
  • Preprocess the original documents by summarising them.

The Hugging Face Python Transformers documentation details how to build a chatbot with Retrieval Augmented Generation (RAG). With such a chatbot, you can pass in an array of documents containing extra knowledge with which the chatbot can answer questions. If more sophisticated solutions such as this are required, it is better to use the Python libraries as opposed to the Javascript as — at the very least — the documentation appears more comprehensive. The Javascript library might be able to do the equivalent thing, however it will probably take you longer to figure out how.

In addition to RAG, the Python Hugging Face Transformers support “tools” — otherwise known as function calling. This allows you to define Python functions and to pass these definitions known as tools into the model. The model can then invoke these tool functions when processing an inference. The functions can perform queries to supply data back to the model on demand. These queries might be against an API for simple lookups or semantic searches against a vector database.

Not all models support tools or function calling. When viewing a model on the Hugging Face website you can look for the tags “tool-calling” or “function-calling”. You can perform a full text search on the site for these tags to list candidate models.

With Hugging Face it is also possible to fine tune a model on a large (training) dataset of knowledge, and once your application is using a fine tuned model, it is no longer necessary to pass the knowledge with each inference. Currently the training APIs are only available in Python. Also consider fine tuning a model requires a large amount of expertise, time, effort and compute.

Share