Take advantage of pandas and SQL for effective data analysis

Photo by Author | Canva

Pandas and SQL are both effective for data analysis, but what if we can integrate their strength? With Pandas QLYou can write SQL questions directly in a Gapter Notebook. This integration enables us to combine SQL logic with effective data analysis without interruption.

In this article, we will use both Pandas and SQL together on the Uber data project. Let’s start!

. What is Pandas QL?

Pandas QL can be connected by a memory with any data frame sqlite Engine, so you can write pure SQL within the environment.

. Benefits of using pandas and SQL together

Benefits of using pandas and SQL together

SQL is useful for easily filtering rows, collecting data, or applying multi -conditioning logic.
On the other hand, it offers modern tools for set -based operations, along with statistics analysis and customs computations, which is beyond SQL capabilities.
When used simultaneously, SQL simplifies data selection, while enhanced analytical flexibility.

. How to run Pandas QL inside a Gapter Notebook?

To run pandasql Inside a Gapter notebook, start with the following code.

import pandas as pd
from pandasql import sqldf
run = lambda q: sqldf(q, globals())

Next, you can run your SQL code like this:

run("""
SELECT *
FROM df
LIMIT 10;
""")

We will use without showing the SQL code run Work every time in this article.
How to run Pandas QL inside the Gapter notebook?

Let’s see how the use of SQL and Pandas works together in a real -life project of Uber.

. Real World Project: Analyzing Ober Driver’s Performance Data

Picture by the writer

In that Data ProjectUber asks us to analyze the driver’s performance data and evaluate the bonus strategy.

!! Data search and analytics

Now, discover the datases. First, we will load the data.

!! Early Datasit Loading

Let’s just load the dataset using pandas.

import pandas as pd
import numpy as np
df = pd.read_csv('dataset_2.csv')

!! Data Search

Now review the data.

Output looks like this:
Data search and analytics

Now we have a glimpse of data.
As you can see, the datastate includes the name of each driver, the number of visits to the completion, their acceptance rate (ie, the percentage of trip applications is accepted), the total supply hours (total hours spent online), and their average rating.
Let’s confirm the column names before starting the data analysis so we can use them properly.

Here is an output.

Data search and analytics

As you can see, there are five different columns in our data, and there is no shortage.
Now let’s answer questions using both SQL and Azigar.

. Question 1: Who qualifies for Bonus Option 1?

In the first question, we have been asked to determine the total payment of bonuses for Option 1, which is:

For each driver, $ 50, which takes at least 8 hours online, accepts 90 % of applications, completes 10 trips, and its rating is 4.7 or better during the timeframe.

!! Step 1: Filtering qualifying drivers with SQL (Pandas QL)

At this stage, we will start using pandasql.

In the following code, we have chosen all the drivers that meet the terms for Option 1 bonus using WHERE Clause and AND Operator to connect several conditions. To learn how to use WHERE And ANDRefer to it Documents.

opt1_eligible = run("""
    SELECT Name                -- keep only a name column for clarity
    FROM   df
    WHERE  `Supply Hours`    >=  8
      AND  `Trips Completed` >= 10
      AND  `Accept Rate`     >= 90
      AND  Rating            >= 4.7;
""")
opt1_eligible

Here is an output.

Showing eligible drivers of Output Option 1

!! Step 2: Ending in Pandas

After filtering the datasate using SQL pandasqlWe go to pandas to do numeric calculations and finalize analysis. It enhances both hybrid techniques, which combines SQL and Azigar, both reading and flexibility.

Next, using the following Azigar code, we multiply the number of eligible drivers (calculate the total payment using len()) Through a $ 50 bonus per driver. Check out Documents To see how you can use len() Ceremony

payout_opt1 = 50 * len(opt1_eligible)
print(f"Option 1 payout: ${payout_opt1:,}")

Here is an output.

. Question 2: Calcing Total Payment for Bonus Option 2

In another question, we have been asked to find a total bonus payment using Option 2:

$ 4/travel for all drivers who complete 12 trips, and have 4.7 or better rating.

!! Step 1: Filtering qualifying drivers with SQL (Pandas QL)

First, we use SQL to filter for drivers that meet the standard of option 2: completion of at least 12 visits and maintain a rating of 4.7 or more.

# Grab only the rows that satisfy the Option-2 thresholds
opt2_drivers = run("""
    SELECT Name,
           `Trips Completed`
    FROM   df
    WHERE  `Trips Completed` >= 12
      AND  Rating            >= 4.7;
""")
opt2_drivers.head()

This is what we get.

Filter the qualifying drivers with SQL (Pandas QL)

!! Step 2: Eliminating calculation in pure pandas

Now let’s calculate using pandas. Counting the total bonus by summarizing the code Trips Completed With column sum() And then multiply the result with a $ 4 bonus per trip.

total_trips   = opt2_drivers("Trips Completed").sum()
option2_bonus = 4 * total_trips
print(f"Total trips: {total_trips},  Option-2 payout: ${option2_bonus}")

Here is the result.

. Question 3: Identify drivers who qualify for option 1 but not option 2

In the third question, we are asked to count the number of drivers who qualify for option 1 but not for option 2.

!! Step 1: Making two eligibility tables with SQL (Pandas QL)

In the following SQL code, we make two datases: one for drivers who meet the standard of option 1 and the other for those who meet the standard of option 2.

# All Option-1 drivers
opt1_drivers = run("""
    SELECT Name
    FROM   df
    WHERE  `Supply Hours`    >=  8
      AND  `Trips Completed` >= 10
      AND  `Accept Rate`     >= 90
      AND  Rating            >= 4.7;
""")

# All Option-2 drivers
opt2_drivers = run("""
    SELECT Name
    FROM   df
    WHERE  `Trips Completed` >= 12
      AND  Rating            >= 4.7;
""")

!! Step 2: Use of Azgar Set Logic to find the difference

Next, we will use Azigar to identify the drivers who appear in Option 1 but not in Option 2, and we will use the set Operations For this

The code is:

only_opt1 = set(opt1_drivers("Name")) - set(opt2_drivers("Name"))
count_only_opt1 = len(only_opt1)

print(f"Drivers qualifying for Option 1 but not Option 2: {count_only_opt1}")

Here is an output.

By combining these methods, we take advantage of the SQL for filtering and take advantage of Azigar’s set logic to compare datases.

. Question 4: Finding low -performance drivers with high ratings

In question 4, we are asked to determine the percentage of drivers who completed less than 10 visits, the acceptance rate was less than 90 %, and still maintains the rating of 4.7 or more.

!! Step 1: Subsit with SQL (Pandas QL)

In the following code, we choose all the drivers who have completed less than 10 visits, the acceptance rate is less than 90 %, and has at least 4.7 ratings.

low_kpi_df = run("""
    SELECT *
    FROM   df
    WHERE  `Trips Completed` < 10
      AND  `Accept Rate`     < 90
      AND  Rating            >= 4.7;
""")
low_kpi_df

Here is an output.

Subset with SQL (Pandas QL)

!! Step 2: Calculating percentage in plain pandas

At this stage, we will use Azigar to calculate the percentage of such drivers.

We divide the number of filtered drivers through a total driver’s count, then multiply 100 to get percent.

The code is:

num_low_kpi   = len(low_kpi_df)
total_drivers = len(df)
percentage    = round(100 * num_low_kpi / total_drivers, 2)

print(f"{num_low_kpi} out of {total_drivers} drivers ⇒ {percentage}%")

Here is an output.

. Question 5: Accounting annual profits without contributing with Uber

In the fifth question, we need to calculate the taxi driver’s annual income without contributing with Uber, based on cost and revenue parameters.

!! Step 1: Draw annual revenue and costs with SQL (Pandas QL)

Using SQL, we calculate annual income from daily fares and reduce gas, rent and insurance costs.

taxi_stats = run("""
SELECT
    200*6*(52-3)                      AS annual_revenue,
    ((200+500)*(52-3) + 400*12)       AS annual_expenses
""")
taxi_stats

Here is an output.

!! Step 2: Dealing profits and margins with pandas

In the next step, we will use drivers for profit and margins for not partnership with Uber.

rev  = taxi_stats.loc(0, "annual_revenue")
cost = taxi_stats.loc(0, "annual_expenses")

profit  = rev - cost
margin  = round(100 * profit / rev, 2)

print(f"Revenue  : ${rev:,}")
print(f"Expenses : ${cost:,}")
print(f"Profit   : ${profit:,}    (margin: {margin}%)")

This is what we get.

. Question 6: Accounting the required fares to maintain profitability

In the sixth question, we assume that the same driver decides to buy the town car and is a partner with Uber.

Gas costs increase by 5 %, insurance decreases by 20 %, and rental costs are eliminated, but the driver needs to cover the car, 000 40,000 costs. We are asked to calculate how much the driver’s weekly gross fares should increase in the first year so that both the car can pay and maintain the same annual profit margin.

!! Step 1: Create a new stack of one -year spending with SQL

At this stage, we will use the SQL with a one -year new expense and rental fee in addition to the cost of the SQL with gas and insurance.

new_exp = run("""
SELECT
    40000             AS car,
    200*1.05*(52-3)   AS gas,        -- +5 %
    400*0.80*12       AS insurance   -- –20 %
""")
new_cost = new_exp.sum(axis=1).iloc(0)
new_cost

Here is an output.

!! Step 2: Calculating weekly fares with pandas

Next, we use Azigar to calculate how much the driver will have to earn every week to keep this margin safe after buying the car.

# Existing values from Question 5
old_rev    = 58800
old_profit = 19700
old_margin = old_profit / old_rev
weeks      = 49

# new_cost was calculated in the previous step (54130.0)

# We need to find the new revenue (new_rev) such that the profit margin remains the same:
# (new_rev - new_cost) / new_rev = old_margin
# Solving for new_rev gives: new_rev = new_cost / (1 - old_margin)
new_rev_required = new_cost / (1 - old_margin)

# The total increase in annual revenue needed is the difference
total_increase = new_rev_required - old_rev

# Divide by the number of working weeks to get the required weekly increase
weekly_bump = round(total_increase / weeks, 2)

print(f"Required weekly gross-fare increase = ${weekly_bump}")

This is what we get.

. Conclusion

Mainly, collecting the strength of SQL and Azigar pandasqlWe solved six different problems.

The SQL helps to summarize filtering and structural datases, while aggression is good in adversized computation and dynamic manipulation.

Through this whole analysis, we took advantage of both tools to simplify the workflow and further to explain every step.

Net Razii A data is in a scientist and product strategy. He is also an affiliated professor of Teaching Analytics, and is the founder of Stratskrich, a platform that helps data scientists prepare for his interview with the real questions of high companies. The net carrier writes on the latest trends in the market, gives interview advice, sharing data science projects, and everything covers SQL.

. What is Pandas QL?

. Benefits of using pandas and SQL together

. How to run Pandas QL inside a Gapter Notebook?

. Real World Project: Analyzing Ober Driver’s Performance Data

!! Data search and analytics

!! Early Datasit Loading

!! Data Search

. Question 1: Who qualifies for Bonus Option 1?

!! Step 1: Filtering qualifying drivers with SQL (Pandas QL)

!! Step 2: Ending in Pandas

. Question 2: Calcing Total Payment for Bonus Option 2

!! Step 1: Filtering qualifying drivers with SQL (Pandas QL)

!! Step 2: Eliminating calculation in pure pandas

. Question 3: Identify drivers who qualify for option 1 but not option 2

!! Step 1: Making two eligibility tables with SQL (Pandas QL)

!! Step 2: Use of Azgar Set Logic to find the difference

. Question 4: Finding low -performance drivers with high ratings

!! Step 1: Subsit with SQL (Pandas QL)

!! Step 2: Calculating percentage in plain pandas

. Question 5: Accounting annual profits without contributing with Uber

!! Step 1: Draw annual revenue and costs with SQL (Pandas QL)

!! Step 2: Dealing profits and margins with pandas

. Question 6: Accounting the required fares to maintain profitability

!! Step 1: Create a new stack of one -year spending with SQL

!! Step 2: Calculating weekly fares with pandas

. Conclusion

Editor's pick

Get latest news

Take advantage of pandas and SQL for effective data analysis

. What is Pandas QL?

. Benefits of using pandas and SQL together

. How to run Pandas QL inside a Gapter Notebook?

. Real World Project: Analyzing Ober Driver’s Performance Data

!! Data search and analytics

!! Early Datasit Loading

!! Data Search

. Question 1: Who qualifies for Bonus Option 1?

!! Step 1: Filtering qualifying drivers with SQL (Pandas QL)

!! Step 2: Ending in Pandas

. Question 2: Calcing Total Payment for Bonus Option 2

!! Step 1: Filtering qualifying drivers with SQL (Pandas QL)

!! Step 2: Eliminating calculation in pure pandas

. Question 3: Identify drivers who qualify for option 1 but not option 2

!! Step 1: Making two eligibility tables with SQL (Pandas QL)

!! Step 2: Use of Azgar Set Logic to find the difference

. Question 4: Finding low -performance drivers with high ratings

!! Step 1: Subsit with SQL (Pandas QL)

!! Step 2: Calculating percentage in plain pandas

. Question 5: Accounting annual profits without contributing with Uber

!! Step 1: Draw annual revenue and costs with SQL (Pandas QL)

!! Step 2: Dealing profits and margins with pandas

. Question 6: Accounting the required fares to maintain profitability

!! Step 1: Create a new stack of one -year spending with SQL

!! Step 2: Calculating weekly fares with pandas

. Conclusion

Note: Trusted Web Agent and Work Floose

Run full Mac or Windows desktop on any device in five minutes

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news