

Picture by the writer
Have you ever wondered if there is a better way to install and run? llama.cpp Locally? Today almost every local larger language model (LLM) depends on the request llama.cpp
As a backbone of running models. But here’s the catch: Most setups are either very complicated, require multiple tools, or you don’t give a powerful user interface out of the box.
It wouldn’t be great if you could:
- Run a powerful model like GPT -SS 20B With just a few orders
- Get A Modern web ui Instantly, without the extra disturbance
- Is The fastest and extremely better setup For local indicators
This is exactly about tutorial.
In this guide, we will pass The best, best and fastest way To run GPT-SS20B Model Locally Using llama-cpp-python
Package together Open webui. Finally, you will have a fully working local LLM environment that is easy, efficient and ready for production.
. 1. To compile your environment
If you already have uv
The command was installed, your life is just easier.
If not, don’t worry. You can quickly install it by following the personnel Uv Installation Guide.
Once uv
Install, open your terminal and install azar 3.12:
Next, let’s set up a project directory, create a virtual environment, and activate it:
mkdir -p ~/gpt-oss && cd ~/gpt-oss
uv venv .venv --python 3.12
source .venv/bin/activate
. 2. Installing the packages of Azigar
Now that your environment is ready, let’s install the desired packages.
First, update the pipe in the latest version. Next, install llama-cpp-python
Server package This version is built with CUDA support (for NVIDIA GPUS), so if you have compatible GPUs you will get maximum performance.
uv pip install --upgrade pip
uv pip install "llama-cpp-python(server)" --extra-index-url
Finally, Install Open Weboi and Suggling Facial Hub:
uv pip install open-webui huggingface_hub
- Open webui: Chatgupat styling web interface for your local LLM server
- Hug the face hub: Directly downloads and manage the model directly from the throat face
. 3. GPT-SOS 20B to download models
Next, let’s download the GPT-OSS 20B model in a quantized format (MXFP4) The hugs face. Quantified models are improved to improve the use of low memory, while still maintaining strong performance, which is best to run locally.
Run the following command in your terminal:
huggingface-cli download bartowski/openai_gpt-oss-20b-GGUF openai_gpt-oss-20b-MXFP4.gguf --local-dir models
. 4. To serve GPT-SOS 20B locally using Llama.cpp
Now that the model has been downloaded, let’s serve it using it llama.cpp
Server.
Run the following command in your terminal:
python -m llama_cpp.server \
--model models/openai_gpt-oss-20b-MXFP4.gguf \
--host 127.0.0.1 --port 10000 \
--n_ctx 16384
What does every flag do here:
--model
: Your quantized model file way--host
: Local host address (127.0.0.1)--port
: Port number (in this case 10000)--n_ctx
: Length of context (16,384 tokens for long conversation)
If everything is working, you will see such logs:
INFO: Started server process (16470)
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on (Press CTRL+C to quit)
To confirm that the server is running and the model is available, run:
curl /v1/models
Expected Production:
{"object":"list","data":({"id":"models/openai_gpt-oss-20b-MXFP4.gguf","object":"model","owned_by":"me","permissions":()})}
Next, we will connect it with the Open Web UI to get it a Chat GPT style interface.
. 5. To launch Open Weboi
We have already installed open-webui
Package. Now, let’s launch it.
Open a new terminal window (keep your llama.cpp
Server is running in first) and run:
open-webui serve --host 127.0.0.1 --port 9000
This will start the web UI server:
When you first open the link in your browser, you will be indicated:
- Make a Admin Account (Using your email and password)
- Login for access to dashboard
This admin account secures your settings, contacts and models for future sessions.
. 6.
As default, the Open Web UI is formed to work with Olama. Since we are running with our model llama.cpp
We need to adjust the settings.
Follow these steps within the Web UI:
!! Add Llama.CPP as an open connection
- WEBUI Open: (Or you forwarded to URL).
- Click on your Avatar (top right corner) → → Admin Settings.
- Go: Contact → Openi Connections.
- Edit the current connection:
- Twenty url:
/v1
- API Key: (Leave blank)
- Twenty url:
- Save the connection.
- (Optional) Disable Olama api And Direct contact To avoid mistakes.
!! Make a friendly model alias
- Go: Admin Settings → Model (Or under the connection you have just created)
- Edit the model name
gpt-oss-20b
- Save the model
!! Start chatting
- Open a New chat
- I Model drop downSelect:
gpt-oss-20b
(Which alias you have created) - Send a test message
. The final views
I did not expect honestly that it would only be easier to run everything with Azgar. In the past, to set up llama.cpp
Mean cloning repository, running CMake
Construction, and debugging endless mistakes – a traumatic process is familiar with many of us.
But with this approach, using llama.cpp
In conjunction with Azgar Sarwar Open Weboi, the setup worked outside the box. There is no dirty, no complicated formation, just a few simple orders.
In this tutorial, we:
- Set up a clean environment with it
uv
- Installed
llama.cpp
Azgar server and Open Weboi - GPT -SS 20B Quantized Model Downloaded
- Presented it locally and linked it to a chat style interface
Result? A completely local, private, and better LLM setup that you can run on your machine with the least effort.
Abid Ali Owan For,,,,,,,,,, for,, for,,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,,, for,,,, for,,,, for,,,, for,, for,.@1abidaliawan) A certified data scientist is a professional who loves to create a machine learning model. Currently, he is focusing on creating content and writing technical blogs on machine learning and data science technologies. Abid has a master’s degree in technology management and a bachelor’s degree in telecommunications engineering. Its vision is to create AI products using a graph neural network for students with mental illness.