Run Locally Full DepsEek-R1-0528 Model

Picture by the writer

DPSEC-R 1-0528 The latest update of the Depsy R1 reasoning model requires a 715GB disk space, which is one of the largest open source model available. However, thanks to the latest quantization technique from UndefinedThe size of the model can be reduced to 162GB, which is reduced by 80 %. This allows users to significantly experience the full strength of the model with low hardware requirements, though with minor trading in performance.

In this tutorial, we will:

Set the DiPsic-R 1-0528 model locally to operate Olama and Open Web UI.
Download and configure the model’s 1.78 bit quantized version (IQ1_S).
GPU + CPU and CPU just run the model using both setups.

Step 0: Conditions

IQ1_S Quantized Version Run Lay your system, your system must meet the following requirements:

GPU Requirements: At least 1x 24GB GPU (for example, NVIDIA RTX 4090 or A6000) and 128GB RAM. With this setup, you can expect a generation generation of about 5 tokens/seconds.

RAM requirements: At least 64GB of RAM is required to run a model to run the model without GPU but the performance will be limited to 1 token/second.

Max Setup: Best Performance (5+ tokens/seconds) LOO, you need a combination of at least 180GB Unified memory or 180GB RAM + VRAM.

Storage: Make sure you have at least 200 GB of free disk space for the model and its dependence.

Step 1: Install Dependent and Olyma

Update your system and install the required tools. Olama is a lightweight server to run a large language model. Install it on Ubuntu partition using the following orders:

apt-get update
apt-get install pciutils -y
curl -fsSL  | sh

Step 2: Download the model and run

Run 1.78 bit quantized version (IQ1_S) of DEPSC-R 1-0528 model using the following command:

ollama serve &
ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0

Step 3: Setup and run to the Open Web UI

Drag the open web UI Dokar image with Koda Support. Run Open Web UI Container with GPU Support and Olama integration.

It will order:

Start Open Web UI Server on Port 8080
Enabled using GPU ACCUSION --gpus all Flag
Mount the required data directory (-v open-webui:/app/backend/dataJes

docker pull ghcr.io/open-webui/open-webui:cuda
docker run -d -p 9783:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:cuda

Once the container is running, access the Open Web UI interface in your browser http://localhost:8080/.

Step 4: DPSEC R1 0528 running in Open Web UI

Select hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0 Model from Model Menu.

If the Olama server fails to use the GPU properly, you may change the CPU implementation. Although this will reduce the performance significantly (about 1 token/sec), it ensures that the model can still run.

# Kill any existing Ollama processes
pkill ollama 

# Clear GPU memory
sudo fuser -v /dev/nvidia* 

# Restart Ollama service
CUDA_VISIBLE_DEVICES="" ollama serve

Once the model is running, you can communicate with it through the Open Web UI. However, note that GPU will be limited to 1 tokens/seconds due to lack of acceleration.

The final views

Even the quantized version was difficult to run. Downloading the model L you need a sharp Internet connection, and if the download fails, you have to resume the entire process from the beginning. I also faced a lot of problems in trying to run my GPU, as I am getting GGGUF mistakes about low VRAM. Despite trying to make several general reforms for GPU’s mistakes, nothing has worked, so I finally turned everything into CPU. Although he worked, it now takes only 10 minutes to respond to the model, which is far from the ideal.

I am sure there are better solutions, maybe using the Lama CPP, but trust me, just running it all day to run it.

Abid Ali Owan For,,,,,,,,,, for,, for,,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,,, for,,,, for,,,, for,,,, for,, for,.@1abidaliawan) A certified data scientist is a professional who loves to create a machine learning model. Currently, he is focusing on creating content and writing technical blogs on machine learning and data science technologies. Abid has a master’s degree in technology management and a bachelor’s degree in telecommunications engineering. Its vision is to create AI products using a graph neural network for students with mental illness.

Step 0: Conditions

Step 1: Install Dependent and Olyma

Step 2: Download the model and run

Step 3: Setup and run to the Open Web UI

Step 4: DPSEC R1 0528 running in Open Web UI

The final views

Editor's pick

Get latest news

Run Locally Full DepsEek-R1-0528 Model

Step 0: Conditions

Step 1: Install Dependent and Olyma

Step 2: Download the model and run

Step 3: Setup and run to the Open Web UI

Step 4: DPSEC R1 0528 running in Open Web UI

The final views

Madzorney toe machine learning 20250609 | By Joncates | June, 2025

Chat GPT has now got a big upgrade in its modern voice mode of paying users, and now it looks like too much human

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news