Photo by author
# Introduction
The ability to gather high-quality, relevant information is still a core skill for any data professional. While there are many ways to collect data, one of the most powerful and reliable ways is through APIs (Application Programming Interfaces). They act as bridges, allowing different software systems to communicate and share data seamlessly.
In this article, we’ll break down the essentials of using APIs to collect data – why they matter, how they work, and how to get started with them in Python.
# What is an API?
An API (Application Programming Interface) is a set of rules and protocols that allow different software systems to communicate and exchange data efficiently.
Think of it like eating at a restaurant. Instead of speaking directly to the chef, you place your order with the waiter. The waiter checks if the ingredients are available, sends the request to the kitchen, and brings your food back when it’s ready.
This is how an API works: it receives your request for specific data, checks if the data exists, and returns it if it is available – serving as a messenger between you and the data source.
When using an API, the interaction typically includes the following components:
- Client: The application or system that sends a request to access data or functionality
- Request: The client sends a structured request to the server, specifying what data it needs
- Server: The system that processes the request and provides the required data or performs an action
- Answer: The server processes the request and sends back the data or result in a structured format, usually JSON or XML.


Photo by author
These communications allow applications to share information or functionality efficiently, enabling tasks such as fetching data from databases or interacting with third-party services.
# Why use APIs to collect data?
APIs offer several advantages for data collection:
- Efficiency: They provide direct access to data, eliminating the need for manual data collection
- Real-time access: APIs often provide up-to-date information, which is essential for time-sensitive analyses
- Automation: They enable automated data retrieval processes, reducing human intervention and potential errors.
- Scalability: APIs can handle large volumes of requests, making them suitable for extensive data collection tasks.
# Implementing API calls in Python
Making a basic API call in Python is a simple and highly practical exercise to get started with data collection. popular Requests The library makes it easy to send HTTP requests and handle responses.
To demonstrate how this works, we will use Random User Generator APIa free service that provides dummy user data in JSON format, perfect for testing and learning.
Here’s a step-by-step guide to making your first API call in Python.
// Installing the application library:
// Importing required libraries:
import requests
import pandas as pd// Checking out the documentation page:
Before making a request, it’s important to understand how the API works. This includes reviewing available endpoints, parameters and response structure. Start by visiting Random User API documentation.
// Defining API endpoints and parameters:
Based on the documentation, we can create a simple application. In this example, we fetch user data limited to users in the United States:
url="
params = {'nat': 'us'}// Request to receive:
use requests.get() Work with URLs and parameters:
response = requests.get(url, params=params)// Handling the response:
Check if the request was successful, then process the data:
if response.status_code == 200:
data = response.json()
# Process the data as needed
else:
print(f"Error: {response.status_code}")// Converting our data to a dataframe:
To work with the data easily, we can convert it to a Pandas DataFrame:
data = response.json()
df = pd.json_normalize(data("results"))
dfNow, let’s illustrate this with a real case.
# Working with the Eurostat API
Eurostat The Statistical Office of the European Union. It provides high-quality, harmonized statistics on a wide range of topics such as economics, demography, environment, industry and tourism, covering all EU member states.
Through its API, Eurostat offers public access to a wide collection of datasets in machine-readable formats, making it a valuable resource for data professionals, researchers and developers interested in analyzing European-level data.
// Step 0: Understanding the data in the API:
If you check the data section of Eurostat, you will find a navigation tree. We can try to identify some data of interest in the following subsections:
- Detailed datasets: Complete Eurostat data in multidimensional format
- Selected datasets: Simple datasets with fewer indicators in 2–3 dimensions
- EU Policies: Data is grouped by specific EU policy areas
- Crosscutting: Thematic data compiled from multiple sources
// Step 1: Checking the Documents:
Always start with the documentation. You can find Eurostat’s API guide here Here. It describes the API structure, available endpoints, and methods for constructing valid requests.
// Step 2: Creating the first call request:
To generate an API request using Python, the first step is to install and import requests Remember the library, we have already installed it in the previous simple example. After that, we can easily create a call request using the demo dataset from the Eurostat documentation.
# We import the requests library
import requests
# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
url = "
# Make the GET request
response = requests.get(url)
# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json()) # Print the JSON responsePro tip: We can split the URL into base URL and parameters to make it easier to understand What data? We Making a request to the API.
# We import the requests library
import requests
# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
url = "
# Define the parameters -> We define the parameters to add in the URL.
params = {
'lang': 'EN' # Specify the language as English
}
# Make the GET request
response = requests.get(url, params=params)
# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json()) # Print the JSON response// Step 3: Determining which dataset to call:
Instead of using the demo dataset, you can select any dataset from the Eurostat database. For example, let’s query the dataset TOUR_OCC_ARN2which contains tourism accommodation data.
# We import the requests library
import requests
# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
base_url = "
dataset = "TOUR_OCC_ARN2"
url = base_url + dataset
# Define the parameters -> We define the parameters to add in the URL.
params = {
'lang': 'EN' # Specify the language as English
}
# Make the GET request -> we generate the request and obtain the response
response = requests.get(url, params=params)
# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json()) # Print the JSON response// Step 4: Understanding the answer
Eurostat’s API returns data in JSON-STAT format, a standard for multivariate statistical data. You can save the response to a file and explore its structure:
import requests
import json
# Define the URL endpoint and dataset
base_url = "
dataset = "TOUR_OCC_ARN2"
url = base_url + dataset
# Define the parameters to add in the URL
params = {
'lang': 'EN',
"time": 2019 # Specify the language as English
}
# Make the GET request and obtain the response
response = requests.get(url, params=params)
# Check the status code and handle the response
if response.status_code == 200:
# Parse the JSON response
data = response.json()
# Generate a JSON file and write the response data into it
with open("eurostat_response.json", "w") as json_file:
json.dump(data, json_file, indent=4) # Save JSON with pretty formatting
print("JSON file 'eurostat_response.json' has been successfully created.")
else:
print(f"Error: Received status code {response.status_code} from the API.")// Step 5: Converting the answer into usable data:
Now that we have the data, we can find a way to save it in a tabular format (CSV) to streamline the analysis process.
import requests
import pandas as pd
# Step 1: Make the GET request to the Eurostat API
base_url = "
dataset = "TOUR_OCC_ARN2" # Tourist accommodation statistics dataset
url = base_url + dataset
params = {'lang': 'EN'} # Request data in English
# Make the API request
response = requests.get(url, params=params)
# Step 2: Check if the request was successful
if response.status_code == 200:
data = response.json()
# Step 3: Extract the dimensions and metadata
dimensions = data('dimension')
dimension_order = data('id') # ('geo', 'time', 'unit', 'indic', etc.)
# Extract labels for each dimension dynamically
dimension_labels = {dim: dimensions(dim)('category')('label') for dim in dimension_order}
# Step 4: Determine the size of each dimension
dimension_sizes = {dim: len(dimensions(dim)('category')('index')) for dim in dimension_order}
# Step 5: Create a mapping for each index to its respective label
# For example, if we have 'geo', 'time', 'unit', and 'indic', map each index to the correct label
index_labels = {
dim: list(dimension_labels(dim).keys())
for dim in dimension_order
}
# Step 6: Create a list of rows for the CSV
rows = ()
for key, value in data('value').items():
# `key` is a string like '123', we need to break it down into the corresponding labels
index = int(key) # Convert string index to integer
# Calculate the indices for each dimension
indices = {}
for dim in reversed(dimension_order):
dim_index = index % dimension_sizes(dim)
indices(dim) = index_labels(dim)(dim_index)
index //= dimension_sizes(dim)
# Construct a row with labels from all dimensions
row = {f"{dim.capitalize()} Code": indices(dim) for dim in dimension_order}
row.update({f"{dim.capitalize()} Name": dimension_labels(dim)(indices(dim)) for dim in dimension_order})
row("Value (Tourist Accommodations)") = value
rows.append(row)
# Step 7: Create a DataFrame and save it as CSV
if rows:
df = pd.DataFrame(rows)
csv_filename = "eurostat_tourist_accommodation.csv"
df.to_csv(csv_filename, index=False)
print(f"CSV file '{csv_filename}' has been successfully created.")
else:
print("No valid data to save as CSV.")
else:
print(f"Error: Received status code {response.status_code} from the API.")// Step 6: Creating a specific theory
Imagine if we only want to keep these records according to campings, apartments or hotels. We can create a final table with this condition, and get a pandas DataFrame We can work.
# Check the unique values in the 'Nace_r2 Name' column
set(df("Nace_r2 Name"))
# List of options to filter
options = ('Camping grounds, recreational vehicle parks and trailer parks',
'Holiday and other short-stay accommodation',
'Hotels and similar accommodation')
# Filter the DataFrame based on whether the 'Nace_r2 Name' column values are in the options list
df = df(df("Nace_r2 Name").isin(options))
df# Best practices when working with APIs
- Read the documentation: Always check the official API documentation to understand the endpoints and parameters
- Handle errors: Use conditionals and logging to gracefully handle failed requests
- Respect rate limits: Avoid overwhelming the server – check if rate limits apply
- Secure credentials: If the API requires authentication, never expose your API keys in public code
# wrap up
Eurostat’s API is a powerful gateway to a wealth of structured, high-quality European statistics. By learning how to navigate its structure, query datasets, and interpret the answers, you can access critical data for analysis, research, or decision-making.
You can check the relevant code Friendly links to my articles in my GitHub repository
Josep Ferrer is an analytical engineer from Barcelona. He graduated in physics engineering and is currently working in the field of data science applied to human mobility. He is a part-time content creator focused on data science and technology. Josep writes everything on AI, covering the application of ongoing blasting in the field.