The best way to run GPT-Asss locally

Picture by the writer

Have you ever wondered if there is a better way to install and run? llama.cpp Locally? Today almost every local larger language model (LLM) depends on the request llama.cpp As a backbone of running models. But here’s the catch: Most setups are either very complicated, require multiple tools, or you don’t give a powerful user interface out of the box.

It wouldn’t be great if you could:

Run a powerful model like GPT -SS 20B With just a few orders
Get A Modern web ui Instantly, without the extra disturbance
Is The fastest and extremely better setup For local indicators

This is exactly about tutorial.

In this guide, we will pass The best, best and fastest way To run GPT-SS20B Model Locally Using llama-cpp-python Package together Open webui. Finally, you will have a fully working local LLM environment that is easy, efficient and ready for production.

. 1. To compile your environment

If you already have uv The command was installed, your life is just easier.

If not, don’t worry. You can quickly install it by following the personnel Uv Installation Guide.

Once uv Install, open your terminal and install azar 3.12:

Next, let’s set up a project directory, create a virtual environment, and activate it:

mkdir -p ~/gpt-oss && cd ~/gpt-oss
uv venv .venv --python 3.12
source .venv/bin/activate

. 2. Installing the packages of Azigar

Now that your environment is ready, let’s install the desired packages.

First, update the pipe in the latest version. Next, install llama-cpp-python Server package This version is built with CUDA support (for NVIDIA GPUS), so if you have compatible GPUs you will get maximum performance.

uv pip install --upgrade pip
uv pip install "llama-cpp-python(server)" --extra-index-url

Finally, Install Open Weboi and Suggling Facial Hub:

uv pip install open-webui huggingface_hub

Open webui: Chatgupat styling web interface for your local LLM server
Hug the face hub: Directly downloads and manage the model directly from the throat face

. 3. GPT-SOS 20B to download models

Next, let’s download the GPT-OSS 20B model in a quantized format (MXFP4) The hugs face. Quantified models are improved to improve the use of low memory, while still maintaining strong performance, which is best to run locally.

Run the following command in your terminal:

huggingface-cli download bartowski/openai_gpt-oss-20b-GGUF openai_gpt-oss-20b-MXFP4.gguf --local-dir models

. 4. To serve GPT-SOS 20B locally using Llama.cpp

Now that the model has been downloaded, let’s serve it using it llama.cpp Server.

Run the following command in your terminal:

python -m llama_cpp.server \
  --model models/openai_gpt-oss-20b-MXFP4.gguf \
  --host 127.0.0.1 --port 10000 \
  --n_ctx 16384

What does every flag do here:

--model: Your quantized model file way
--host: Local host address (127.0.0.1)
--port: Port number (in this case 10000)
--n_ctx: Length of context (16,384 tokens for long conversation)

If everything is working, you will see such logs:

INFO:     Started server process (16470)
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on  (Press CTRL+C to quit)

To confirm that the server is running and the model is available, run:

curl /v1/models

Expected Production:

{"object":"list","data":({"id":"models/openai_gpt-oss-20b-MXFP4.gguf","object":"model","owned_by":"me","permissions":()})}

Next, we will connect it with the Open Web UI to get it a Chat GPT style interface.

. 5. To launch Open Weboi

We have already installed open-webui Package. Now, let’s launch it.

Open a new terminal window (keep your llama.cpp Server is running in first) and run:

open-webui serve --host 127.0.0.1 --port 9000

This will start the web UI server:

When you first open the link in your browser, you will be indicated:

Make a Admin Account (Using your email and password)
Login for access to dashboard

This admin account secures your settings, contacts and models for future sessions.

. 6.

As default, the Open Web UI is formed to work with Olama. Since we are running with our model llama.cppWe need to adjust the settings.

Follow these steps within the Web UI:

!! Add Llama.CPP as an open connection

WEBUI Open: (Or you forwarded to URL).
Click on your Avatar (top right corner) → → Admin Settings.
Go: Contact → Openi Connections.
Edit the current connection:
1. Twenty url: /v1
2. API Key: (Leave blank)
Save the connection.
(Optional) Disable Olama api And Direct contact To avoid mistakes.

!! Make a friendly model alias

Go: Admin Settings → Model (Or under the connection you have just created)
Edit the model name gpt-oss-20b
Save the model

!! Start chatting

Open a New chat
I Model drop downSelect: gpt-oss-20b (Which alias you have created)
Send a test message

. The final views

I did not expect honestly that it would only be easier to run everything with Azgar. In the past, to set up llama.cpp Mean cloning repository, running CMake Construction, and debugging endless mistakes – a traumatic process is familiar with many of us.

But with this approach, using llama.cpp In conjunction with Azgar Sarwar Open Weboi, the setup worked outside the box. There is no dirty, no complicated formation, just a few simple orders.

In this tutorial, we:

Set up a clean environment with it uv
Installed llama.cpp Azgar server and Open Weboi
GPT -SS 20B Quantized Model Downloaded
Presented it locally and linked it to a chat style interface

Result? A completely local, private, and better LLM setup that you can run on your machine with the least effort.

Abid Ali Owan For,,,,,,,,,, for,, for,,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,,, for,,,, for,,,, for,,,, for,, for,.@1abidaliawan) A certified data scientist is a professional who loves to create a machine learning model. Currently, he is focusing on creating content and writing technical blogs on machine learning and data science technologies. Abid has a master’s degree in technology management and a bachelor’s degree in telecommunications engineering. Its vision is to create AI products using a graph neural network for students with mental illness.

. 1. To compile your environment

. 2. Installing the packages of Azigar

. 3. GPT-SOS 20B to download models

. 4. To serve GPT-SOS 20B locally using Llama.cpp

. 5. To launch Open Weboi

. 6.

!! Add Llama.CPP as an open connection

!! Make a friendly model alias

!! Start chatting

. The final views

Editor's pick

Get latest news

The best way to run GPT-Asss locally

. 1. To compile your environment

. 2. Installing the packages of Azigar

. 3. GPT-SOS 20B to download models

. 4. To serve GPT-SOS 20B locally using Llama.cpp

. 5. To launch Open Weboi

. 6.

!! Add Llama.CPP as an open connection

!! Make a friendly model alias

!! Start chatting

. The final views

How do I use Chattagot to convert easy ideas into beautiful AI art by Elias Latta | August, 2025

The next set of VC judges for the Startup Bit Field 200 to disrupt 2025

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news