
Photo by author
# How Colab works
Google Collab An incredibly powerful tool for data science, machine learning, and Python development. This is because it removes the headache of local setup. However, one area that often confuses beginners and sometimes even intermediate users is file management.
Where do the files live? Why do they disappear? How do you upload, download, or permanently store data? This article answers all of them step by step.
Let’s get the biggest misconception out of the way right away. Google Colab doesn’t work like your laptop. Every time you open a notebook, Colab gives you a temporary virtual machine (VM). Once you leave, everything inside is cleared. This means:
- Files saved locally are temporary.
- When the runtime is reset, the files are deleted.
Your default working directory is:
Whatever you keep inside you. /content Will disappear after the runtime is reset.
# Viewing files in Colab
You have two easy ways to view your files.
// Method 1: Using the visual method
Here is the recommended method for beginners:
- Check out the left sidebar
- Click on the folder icon.
- Browse inside.
/content
It’s great when you just want to see what’s going on.
// Method 2: Using the Python method
This is useful when you are scripting or debugging routes.
import os
os.listdir('/content')# Uploading and downloading files
Suppose you have a data set or comma separated values ​​(CSV) file on your laptop. The first method is to upload using a code.
from google.colab import files
files.upload()A file picker opens, you select your file, and it appears. /content. This file is temporary until moved elsewhere.
Another method is drag and drop. This method is simple, but the storage is temporary.
- Open File Explorer (left panel)
- Drag and drop files directly.
/content
To download a file from Colab to your local machine:
from google.colab import files
files.download('model.pkl')Your browser will immediately download the file. It works for CSVs, models, logs and images.
If you want your files to survive runtime resets you should use Google Drive. To mount Google Drive:
from google.colab import drive
drive.mount('/content/drive')Once you allow access, your drive appears on:
Everything stored here is permanent.
# Recommended project folder structure
A dirty drive becomes painful very quickly. A neat structure you can reuse is:
MyDrive/
└── ColabProjects/
└── My_Project/
├── data/
├── notebooks/
├── models/
├── outputs/
└── README.mdTo save time, you can use the following routes:
BASE_PATH = '/content/drive/MyDrive/ColabProjects/My_Project'
DATA_PATH = f'{BASE_PATH}/data/train.csv'To save a file permanently using Panda.:
import pandas as pd
df.to_csv('/content/drive/MyDrive/data.csv', index=False)To load the file later:
df = pd.read_csv('/content/drive/MyDrive/data.csv')# File Management in Collab
// Working with ZIP files
To extract the zip file:
import zipfile
with zipfile.ZipFile('dataset.zip', 'r') as zip_ref:
zip_ref.extractall('/content/data')// Using Shell Commands for File Management
Supports Linux shell commands using Colab. !.
!pwd
!ls
!mkdir data
!rm file.txt
!cp source.txt destination.txtThis is very useful for automation. Once you get used to it, you will use it more often.
// Downloading files directly from the Internet
Instead of uploading manually, you can use wget:
!wget Using the or Requests Library in Python:
import requests
r = requests.get(url)
open('data.csv', 'wb').write(r.content)It is highly efficient for datasets and pre-trained models.
# Additional considerations
// Storage limitations
You should be aware of the following limitations:
- Colab VM disk space is around 100 GB (tentative).
- Google Drive storage is limited by your personal quota.
- Browser-based uploads are limited to approximately 5 GB.
For large data sets, always plan ahead.
// Best practice
- Mount the drive at the beginning of the notebook
- Use variables for paths.
- Keep raw data read-only
- Separate data, models and output into separate folders.
- Add a README file for your future.
// When not to use Google Drive
Avoid using Google Drive when:
- Training on very large datasets
- Fast I/O is critical to performance.
- You need distributed storage.
Alternatives you can use in these cases include:
# Final thoughts
Once you understand how Colab file management works, your workflow becomes much more efficient. No need to worry about missing files or rewriting code. With these tools, you can ensure clean experiences and smooth data transfers.
Kanwal Mehreen is a machine learning engineer and a technical writer with a deep passion for AI along with data science and medicine. He co-authored the e-book “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she is a champion of diversity and academic excellence. She has also been recognized as a Teradata Diversity in Tech Scholar, a Mitacs Globalink Research Scholar, and a Harvard WeCode Scholar. Kanwal is a passionate advocate for change, having founded FEMCodes to empower women in STEM fields.