SQL for Data Analysts: Essential Queries for Data Extraction and Transformation

Photo by editor

# Introduction

Data analysts need to work with large amounts of information stored in databases. Before they can create reports or find insights, they must first pull the right data and prepare it for use. This is where SQL (Structured Query Language) comes in. SQL is a tool that helps analysts retrieve data, clean it and organize it in the desired format.

In this article, we’ll look at the most important SQL queries that every data analyst should know.

# 1. Selecting data with select

Select Statement is the basis of SQL. You can select or use specific columns * To return all available fields.

SELECT name, age, salary FROM employees;

This query only draws namefor , for , for , . ageand salary column from employees Table

# 2. Filtering data with where

where It annoys people who match your circumstances. It supports comparison and logical operators to create precise filters.

SELECT * FROM employees WHERE department="Finance";

Where the clause returns only the employees belonging to the Finance Department.

# 3. Sorting results with order

Order by The clause sorts the query results in ascending or descending order. It is used to rank records by numeric, text, or date values.

SELECT name, salary FROM employees ORDER BY salary DESC;

This query sorts the employees by salary in descending order, so the highest paid employees appear first.

# 4. Remove duplicates with separate

separately The keyword returns only the unique values from the column. This is useful when creating neat lists of categories or attributes.

SELECT DISTINCT department FROM employees;

Removes separate duplicate entries, returning each department name only once.

# 5. Limiting results with limits

limit The clause limits the number of rows returned by a query. It is often paired Order by To display high results or sample data from large tables.

SELECT name, salary 
FROM employees 
ORDER BY salary DESC 
LIMIT 5;

It retrieves the top 5 employees with the highest salaries by combining them Order by with the limit.

# 6. Collection with group by

By group Rows of clauses that share the same values in specified columns. It is used in aggregate functions such as SUM()for , for , for , . AVG()or COUNT() To prepare the summary.

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

Group organizes queues by department, and AVG(salary) Calculates the average salary for each group.

# 7. Filtering groups with being

to be Clause filters group the results after aggregation. It is used when conditions depend on aggregate values, such as sums or averages.

SELECT department, COUNT(*) AS num_employees
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

The query counts the number of employees in each department and then filters to keep only departments with more than 10 employees.

# 8. Join the tables with joins

join clause joins rows from two or more tables based on the corresponding column. It helps in retrieving associated data, such as employees with their departments.

SELECT e.name, d.name AS department
FROM employees e
JOIN departments d ON e.dept_id = d.id;

Here, the join associates the employees with their matching department names.

# 9. Combination of results with union

Union Combines the results of two or more queries into a single dataset. It automatically removes duplicates unless you use them UNION ALLwhich holds them.

SELECT name FROM employees UNION SELECT name FROM customers;

This query joins the two names employees And customers Tables in the same list.

# 10. String functions

String functions in SQL are used to manipulate and transform text data. They help with things like concatenating names, changing case, trimming spaces, or extracting parts of a string.

SELECT CONCAT(first_name, ' ', last_name) AS full_name, LENGTH(first_name) AS name_length FROM employees;

This query creates a full name by concatenating the first and last names and calculates the length of the first name.

# 11. Date and time functions

The date and time functions in SQL let you work with temporary data for analysis and reporting. They can calculate differences, extract components such as years or months, and adjust dates by adding or subtracting intervals. For example, DATEDIFF() with the CURRENT_DATE Can measure duration.

SELECT name, hire_date, DATEDIFF(CURRENT_DATE, hire_date) AS days_at_company FROM employees;

It calculates how many days each employee has been with the company from today.

# 12. Creating new columns with case

Case Expressions create new columns with conditional logic, similar to IF-else statements. It lets you dynamically categorize or change data within your queries.

SELECT name,
       CASE 
           WHEN age < 30 THEN 'Junior'
           WHEN age BETWEEN 30 AND 50 THEN 'Mid-level'
           ELSE 'Senior'
       END AS experience_level
FROM employees;

The case statement creates a new column called experience_level Based on age limits

# 13. Handling Missing Values with Coalesce

coalesce Handles missing values by returning the first non-null value from the list. It is usually used to replace NULL Fields with default values, such as “N/A”.

SELECT name, COALESCE(phone, 'N/A') AS contact_number FROM customers;

Here, Coalesce replaces missing phone numbers with “N/A”.

# 14. Subqueries

Subsets are nested queries within another query to provide intermediate results. They are used WHEREfor , for , for , . FROMor SELECT Clauses to dynamically filter, compare, or create datasets.

SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

This query compares each employee’s salary to the company’s average salary using a nested subset.

# 15. Window functions

Window functions perform calculations on a set of rows while still returning individual row details. They are commonly used for sorting, running totals and comparing values between rows.

SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank FROM employees;

RANK() The function assigns a salary-based classification to each employee without grouping the rows.

# The result

Mastering SQL is an extremely valuable skill for any data analyst, as it provides the foundation for extracting, transforming and interpreting data. From filtering and aggregation to joining and maintaining datasets, SQL empowers analysts to turn raw data into meaningful insights that drive decision-making. By mastering the essential questions, analysts not only streamline their workflow but also ensure accuracy and scalability in their analyses.

Jayta Gulti A machine learning enthusiast and technical writer driven by his passion for building machine learning models. He holds a Master’s degree in Computer Science from the University of Liverpool.

# Introduction

# 1. Selecting data with select

# 2. Filtering data with where

# 3. Sorting results with order

# 4. Remove duplicates with separate

# 5. Limiting results with limits

# 6. Collection with group by

# 7. Filtering groups with being

# 8. Join the tables with joins

# 9. Combination of results with union

# 10. String functions

# 11. Date and time functions

# 12. Creating new columns with case

# 13. Handling Missing Values with Coalesce

# 14. Subqueries

# 15. Window functions

# The result

Editor's pick

Get latest news

SQL for Data Analysts: Essential Queries for Data Extraction and Transformation

# Introduction

# 1. Selecting data with select

# 2. Filtering data with where

# 3. Sorting results with order

# 4. Remove duplicates with separate

# 5. Limiting results with limits

# 6. Collection with group by

# 7. Filtering groups with being

# 8. Join the tables with joins

# 9. Combination of results with union

# 10. String functions

# 11. Date and time functions

# 12. Creating new columns with case

# 13. Handling Missing Values ​​with Coalesce

# 14. Subqueries

# 15. Window functions

# The result

5 Docker containers for your AI infrastructure

Adobe Foundry wants to rebuild Firefly for your brand — not just tweak it

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news

# 13. Handling Missing Values with Coalesce