https://www.youtube.com/watch?v=tqottksSSSSSYS
In this project walkthrough, we will discover how to use SQL for data analysis from the Digital Music Store and answer important business questions. Working with the Chinok Database-a sample database that represents a digital media store like iTunes-we will show how the SQL can run data-making decision-making in the business context.
Chinok Database Contains information about artists, albums, tracks, consumers and sales data. Through strategic SQL questions, we will help the business to understand its market trends, evaluate employees’ performance and identify growth opportunities. This project shows real -world SQL applications that have to face data analysts daily.
We will take you through the basic search analysis to modern questions from writing rapidly complex SQL questions using modern questions Common Table expression (CTES) And subquiries.
Will you learn
By the end of this tutorial, you will know how:
- Go to complex relative database schemes with multiple tables
- Write the SQL questions using to connect data into multiple tables
- Use a Common Table expression (CTES) to manage complex questions
- Apply sub -subconsis to calculate percentage and comparative measurements
- Analyze business data to provide viable insights
- Connect the SQL questions to the Gapter
Before you start: Pre -instruction
To take the maximum of this project, follow these initial steps:
- Review the project
- Prepare your environment
- If you are using the Data Quest platform, everything has already been configured for you
- If you are working locally, you will need:
- Get relief from the basic principles of SQL
- You should be familiar with the basic SQL key words:
SELECT
For, for, for,.FROM
For, for, for,.GROUP BY
AndJOIN
- Some experience with CTE and subtiters will be helpful, but it is not required
- New in Mark Dowan? We recommend learning the basics: Mark Dowan Guide
- You should be familiar with the basic SQL key words:
To set your environment
Before we are involved in our analysis, let’s establish your Jepeter environment to work with the SSQL. We will use some SQL magic commands that allow us to write SQL directly into the gypster cells.
%%capture
%load_ext sql
%sql sqlite:///chinook.db
The vision of learning: gave
%%capture
The magic command suppresses any output messages from the cell, keeping our notebook clean.%load_ext sql
The command loads the SQL extension, and%sql sqlite:///chinook.db
Connects us to your database.
Now we confirm our connection and discover which tables are available in our database:
%%sql
SELECT name
FROM sqlite_master
WHERE type='table';
This special SQLITE inquiry shows all the names of the table in our database. The Chinok Database contains 11 tables that represent various aspects of the digital music store.
album
: Album detailsartist
: Artist infocustomer
: Customer Information with Assistant Assistant Representativesemployee
: Store employees, including sales support agentgenre
: Music speciesinvoice
: Sales transactionsinvoice_line
: Individual items within each invoicemedia_type
: Format Types (MP3, AAC, etc.)playlist
: Curates Playlistsplaylist_track
: Tracks in each playlisttrack
: Song information
Understanding Database Scheme
Working with the relevant database means how to connect the tables to each other. The Chinok Database uses basic and foreign keys to establish these relationships. Here is an easy theory of key relationships between tables we are working with:
customer
Is connected toemployee
Throughsupport_rep_id
invoice
Is connected tocustomer
Throughcustomer_id
invoice_line
Is connected toinvoice
Throughinvoice_id
track
Is connected toalbum
For, for, for,.invoice_line
Andgenre
Throughalbum_id
For, for, for,.track_id
Andgenre_id
Respectively
Let’s preview some of our key tables to understand the data we are working with:
%%sql
SELECT *
FROM track
LIMIT 5;
Track_ ID | Name | Album_ ID | Media_ Type_ ID | General_ID | Computer | Millions of seconds | Bytes | Unit_pice |
---|---|---|---|---|---|---|---|---|
1 | For those who are rocks (we salute you) | 1 | 1 | 1 | Angos Ying, Malcolm Ying, Brian Johnson | 343719 | 11170334 | 0.99 |
2 | Balls on the wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 |
3 | Sharp like shark | 3 | 2 | 1 | F Bltas, S. Kafman, U -Drokesinade and W. Halfman | 230619 | 3990994 | 0.99 |
4 | The restless and wild | 3 | 2 | 1 | Falts, RA Smith Diesel, S. Kafman, Yu. Druxyder and W. Halfman | 252051 | 4331779 | 0.99 |
5 | Princess of rising sun | 3 | 2 | 1 | Defeat and Ra Smith Diesel | 375418 | 6290521 | 0.99 |
%%sql
SELECT *
FROM invoice_line
LIMIT 5;
Invoice_Line_d | Invoice_ ID | Track_ ID | Unit_pice | Mass |
---|---|---|---|---|
1 | 1 | 1158 | 0.99 | 1 |
2 | 1 | 1159 | 0.99 | 1 |
3 | 1 | 1160 | 0.99 | 1 |
4 | 1 | 1161 | 0.99 | 1 |
5 | 1 | 1162 | 0.99 | 1 |
Learning Insight: When working with a new database, always preview your tables to understand the data structure before writing complex questions. This helps you identify column names, data types and potential relationships without which your output floods with hundreds of rows.
Business Question 1: In what music should we focus on in the United States?
The Chinok Store wants to understand which music genre is the most famous in the United States market. This information will help them decide which new albums should be added to their catalog. Let’s make an inquiry to analyze gender popularity through sales.
Make our analysis with CTE
We will use a combined table expression (CTE) to create a temporary result set that connects data to multiple tables.
%%sql
WITH genre_usa_tracks AS (
SELECT
il.invoice_line_id,
g.name AS genre,
t.track_id,
i.billing_country AS country
FROM track t
JOIN genre g ON t.genre_id = g.genre_id
JOIN invoice_line il ON t.track_id = il.track_id
JOIN invoice i ON il.invoice_id = i.invoice_id
WHERE i.billing_country = 'USA'
)
SELECT
genre,
COUNT(*) AS tracks_sold,
COUNT(*) * 100.0 / (SELECT COUNT(*) FROM genre_usa_tracks) AS percentage
FROM genre_usa_tracks
GROUP BY genre
ORDER BY tracks_sold DESC;
Genre | Track_sold | Percentage |
---|---|---|
Rock | 561 | 53.37773549000951 |
Replacement and punk | 130 | 12.369172216936251 |
Metal | 124 | 11.798287345385347 |
R&B/Spirit | 53 | 5.042816365366318 |
Blues | 36 | 3.4253092293054235 |
Alternate | 35 | 3.330161750713606 |
Latin | 22 | 2.093244529019981 |
Pop | 22 | 2.093244529019981 |
Hip hop/rap | 20 | 1.9029495718363463 |
Why | 14 | 1.3320647002854424 |
Listening easily | 13 | 1.236917221693625 |
Rig | 6 | 0.570884871550904 |
Electronic/Dance | 5 | 0.47573739295908657 |
Classic | 4 | 0.38058991436726924 |
Heavy metal | 3 | 0.28544243575452 |
Sound track | 2 | 0.19029495718363462 |
TV shows | 1 | 0.09514747859181731 |
Learning insights: CTE enables complex questions to break more in logical steps and read more. Here, we first make a filtered datastate of USA purchases, then analyze it. In our calculation of 100.0 ensures that we get decisive results instead of a numerical division.
Our results show this Rock Music dominates the USA market with more than 50 % sales, followed by LatinFor, for, for,. MetalAnd Replacement and punk. This shows that the store should give priority to these genders when choosing a new inventory.
Key insights from gender analysis
- The rock is dominated: With 561 track sales (53.4 %), Rock is the most famous gender of so far
- Latin music surprise: The second most famous gender is Latin (10.3 %), which indicates an important part of the market
- Long tail effect: Many gender contains very few percent, suggesting niche markets
Business Question 2: Analyzing employees’ sales performance
The company wants to evaluate the performance of its sales support agents to identify high -actors and areas of improvement. Let’s analyze which employees receive the most taxes.
%%sql
SELECT
e.first_name || ' ' || e.last_name AS employee_name,
e.hire_date,
COUNT(DISTINCT c.customer_id) AS customer_count,
SUM(i.total) AS total_sales_dollars,
SUM(i.total) / COUNT(DISTINCT c.customer_id) AS avg_dollars_per_customer
FROM customer c
JOIN invoice i ON c.customer_id = i.customer_id
JOIN employee e ON c.support_rep_id = e.employee_id
GROUP BY e.employee_id, e.hire_date
ORDER BY total_sales_dollars DESC;
Employee_ Name | Higher_Det | Customer_Count | Total_selz_Dular | avg_dollarrs_per_customer |
---|---|---|---|---|
Gene peacock | 2017-04-01 00:00:00 | 21 | 1731.5100000000039 | 82.45285714285733 |
Margaret Park | 2017-05-03 00:00:00 | 20 | 1584.0000000000034 | 79.20000000000017 |
Steve Johnson | 2017-10-17 00:00:00 | 18 | 1393.920000000002 | 77.440000000011 |
Learning insights: When using
GROUP BY
With overall functions, remember that add all unorganized columns to yourGROUP BY
The clause is most needed in SQL flavors (though the skullite is more forgiving).||
The operator connects the strings to Sqlite.
Performance analysis results
Our analysis shows interesting patterns:
- Gene peacock Despite not being most consumers, leads to the highest average dollar with an average dollar per customer
- Of Margaret Park With a matrix near the gene, the performance is solid, which suggested a permanent level of customer value supply
- Steve JohnsonLatest Employer, More experienced staff shows promising performance with matriculation
Business Question 3: By combining SQL for Visual
While SQL takes over data retrieval and transformation, it enables powerful ideas by connecting it with azer. Let’s show how the results of the SQL inquiry should be transmitted to Azigar:
import pandas as pd
# Store our query as a string
query = """
SELECT
genre,
COUNT(*) AS tracks_sold
FROM genre_usa_tracks
GROUP BY genre
ORDER BY tracks_sold DESC
LIMIT 10;
"""
# Execute the query and store results
result = %sql \$query
# Convert to pandas DataFrame
df = result.DataFrame()
The vision of learning: gave
%sql
Inline Magic (single percent mark) allows us to implement the SQL and capture the results in Azar. The dollar sign syrup (\$query
) The SQL allows us to refer to the variables of the magic of magic.
Challenges and reservations
During our analysis, we faced numerous important concepts of SQL that highlight:
1. The risk of the Intelligence Division
When calculating the percentage, SQL performs a numeric division through default:
-- This returns 0 for all percentages
SELECT COUNT(*) / (SELECT COUNT(*) FROM table) AS percentage
-- This returns proper decimals
SELECT COUNT(*) * 100.0 / (SELECT COUNT(*) FROM table) AS percentage
2. Join the selection cases
We used to use INNER JOIN
Because we just wanted the record that was in all relevant tables. If we need to add users without invoice, we will use LEFT JOIN
Instead of
3.
Our percentage of calculations uses a sub -reservoir that comes in for every row. Consider the use of Window, Window functions of major datases or calculate in the CTE in advance.
Draw your work with Gut Hub Jest
Gut Hub GIST provides a good way to share your SQL projects without the complexity of full reservoirs. The way to share your work is:
- Navigate gist.github.com
- Make a new summary
- Place the name with your file
.ipynb
Extended for the Gapter notebook or.sql
For SQL Scripts - Paste your code and make either a public or secret summary
The Gist automatically secure the Gapter notebook with all the results, making them the best of sharing the analysis results with stakeholders or includes your project portfolio.
Analysis summary
In this project, we have shown how SQL can answer important business questions for the digital music store.
- The gender analysis: We identified Rock With (53.4 %), as a dominant gender in the US market, Latin Music as a wonderful second place
- The employee’s performance: We reviewed sales representatives, discovering that Jean Maver receives lead in each user’s average income
- Technical skills: We have applied CTE, sub -reserves, multiple and overall functions to solve real business problems
This insight enables data -powered decisions about inventory management, employees’ training, and market strategy.
Next steps
To enhance this analysis and deepen your SQL skills, consider these challenges:
- Time -based analysis: How do sales trends change over time? Add the date filtering to indicate seasonal patterns
- Customer’s distribution: Which users are most valuable? Create Customer Classes based on purchase behavior
- Product recommendations: Which tracks are usually purchased together? Use self -jins to find associations
- International markets: Expand gender analysis to compare priorities in different countries
If you are new to the SQL and have found challenging the project, start our SQL basic principles to create the basic skills needed for complex analysis. This course covers essential titles such as included, gatherings, and sub -reservoirs we have used throughout the project.
Congratulations!