
Photo by Author | Canva
Pandas is a rapidly growing community in the library. This popularity has opened the door for alternatives Polar. In this article, we will find a similar alternative, Dick DB.
Dick DB is an SQL database that you can run in your notebook. No setup is needed, and no server is required. It is easy to install and can work with pandas parallel.
Unlike other SQL database, you do not need to create a server. It only works with your notebook after installation. This means that there is no local setup headache, you are writing the code immediately. Dick DB handles filtering, Is addedAnd Deposit Clean SQL syntax, compared to pandas, and performs significantly better on large datases.
So with the terms, let’s start!
Data Project – Uber Business Modeling
We will use it with a Gapter notebook, connecting it to the data analysis. To make things more interesting, we will work on a real -life data project. Let’s start!
Here is the link We will use this article for the data project. This is a Uber data project called Partner Business Modeling.
Uber used this data project in the process of recruiting data science positions, and you will be asked to analyze data for two different scenarios.
- Scene 1: Compare the cost of two bonus programs designed to get more drivers online during the busy day.
- Scene 2: Calculate and compare the annual net income of the traditional taxi driver vs. One who contributes to Uber and buys a car.
Dataset is loading
Let’s load the data frame first. This move will be needed. Therefore, we will register this datastate with dickdb in the following parts.
import pandas as pd
df = pd.read_csv("dataset_2.csv")
Dataset Search
Here are the first few rows:
Let’s see all the columns.
Here is an output.
Connect Dick DB and register data frame
Well, this is really a straightforward data, but how can we connect Dickdb to this datastate?
First, if you haven’t installed it yet, install Dick DB.
It’s easy to connect with Dick DB. Nos, if you want to read the documents, check it out Here.
Now, there is a code to make a connection and register the data frame.
import duckdb
con = duckdb.connect()
con.register("my_data", df)
Well, let’s start searching for seven questions that will save you the work of pandas!
1. To filter multi -standards for complex qualification rules
One of the most important benefits of SQL is that it naturally handles filtering, especially multi -conditioning filtering, very easily.
Implementation of multi -faceted filtering in Dick DB vs Pandas
Dick DB allows you to apply multiple filters using SQL where clauses and logic, which also increase the number of filters as well as scales.
SELECT
*
FROM data
WHERE condition_1
AND condition_2
AND condition_3
AND condition_4
Now let’s see how we will write the same logic in the Pandas. In pandas, small logic is expressed using brackets with boulin masks, which can be obtained under many terms.
filtered_df = df(
(df("condition_1")) &
(df("condition_2")) &
(df("condition_3")) &
(df("condition_4"))
)
Both methods are equally applied to readable and basic use. Dick DB feels more natural and clean because the logic becomes more complicated.
Multi -quality filtering for Uber Data Project
In this case, we want to find drivers who qualify for a specific Uber bonus program.
According to the rules, drivers must:
- Stay at least 8 hours online
- Complete at least 10 visits
- Accept at least 90 % ride requests
- 4.7 or above has a rating
Now we have to write only one question that filters it all. Here is a code.
SELECT
COUN(*) AS qualified_drivers,
COUNT(*) * 50 AS total_payout
FROM data
WHERE "Supply Hours" >= 8
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
AND "Trips Completed" >= 10
AND Rating >= 4.7
But we need to add this code with Azgar to wear, we need to add con.execute (“” “” “”) and Fatch DF () methods as shown below:
con.execute("""
SELECT
COUNT(*) AS qualified_drivers,
COUNT(*) * 50 AS total_payout
FROM data
WHERE "Supply Hours" >= 8
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
AND "Trips Completed" >= 10
AND Rating >= 4.7
""").fetchdf()
We will do this in the whole article. Now when you know how to run it in a Gapter Notebook, we will only show the SQL code from now on, and you will know how to convert it into a pitoonic version.
Well now, remember that the data project wants us to calculate the total payment for Option 1.
We have calculated the driver’s amount, but we should raise it to $ 50, as the payment for each driver will be $ 50, so we will do it with counting.
* 50.
Here is an output.
Filtering multi -tittia
2. Fast collecting to estimate business concessions
SQL is great to collect faster, especially when you need to summarize data in rows.
Implementation of Dick DB vs. Pandas Enforcement
SELECT
COUNT(*) AS num_rows,
SUM(column_name) AS total_value
FROM data
WHERE some_condition
Dick DB allows you to use overall values ​​in the row using SQL functions such as SUM and counting in a compact block.
filtered = df(df("some_condition"))
num_rows = filtered.shape(0)
total_value = filtered("column_name").sum()
In the pandas, you need to filter the data frame first, then count separately using chains methods and money.
Dick DB is more comprehensive and easy to read, and does not need to manage intermediate variables.
Deposit in Uber Data Project
- Well, let’s go ahead with the second bonus scheme, option 2. According to the project description, drivers will get $ 4 per trip if:
- They complete at least 12 visits.
4.7 or better is the rating.
SELECT
COUNT(*) AS qualified_drivers,
SUM("Trips Completed") * 4 AS total_payout
FROM data
WHERE "Trips Completed" >= 12
AND Rating >= 4.7
This time, instead of counting the drivers, we need to add the number of their full journey since they are paid bonuses at every trip, not everyone.
The count here tells us how many drivers are eligible. However, we will calculate the total payment, we will calculate their visits and multiply $ 4, as needed by Option 2.
Collect in Dick DB
Here is an output.
Collect in Dick DB
With Dick DB, we do not need to loop rows or collect custom. The money function takes care of everything we need.
3. Explore overlaps and differences using Bolin Logic
In the SQL, you can easily combine the situation, such as the Bowlin logic, or, and, not, and not, and not.
Dick DB vs Bolian Logic Implementation in Pandas
SELECT *
FROM data
WHERE condition_a
AND condition_b
AND NOT (condition_c)
Dick DB Black Logic locally where the clause uses and, or, not.
filtered = df(
(df("condition_a")) &
(df("condition_b")) &
~(df("condition_c"))
)
Pandas needs a combination of logical operators with masks and brackets, including the use of “~” for negatives.
Although both are active, Dick DB is easy to argue when logic contains emission or nesting terms.
Bowlin logic for Uber Data Project
Now we have calculated Option 1 and Option 2, what comes forward? The time has come to compare. Remember our next question.
Bowlin logic in Dick DB
SELECT COUNT(*) AS only_option1
FROM data
WHERE "Supply Hours" >= 8
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
AND "Trips Completed" >= 10
AND Rating >= 4.7
AND NOT ("Trips Completed" >= 12 AND Rating >= 4.7)
This is the place where we can use the Bowel Logic. We will use a combination and not.
Here is an output.
Bowlin logic in Dick DB
- Let’s break it:
- Here are the first four terms Option 1.
No (..) Part is used to exclude drivers who are also eligible for option 2.
It’s perfectly straight, okay?
4. Instant cohort with conditional filters
Sometimes, you want to understand how big a specific group or harmony is in your data.
Implementation of Conditional Filters Conditional Filters in Dick DB vs Pandas?
SELECT
ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM data), 2) AS percentage
FROM data
WHERE condition_1
AND condition_2
AND condition_3
The Dick DB handles co -filtering and percentage calculations with an SQL inquiry, even includes sub -reserves.
filtered = df(
(df("condition_1")) &
(df("condition_2")) &
(df("condition_3"))
)
percentage = round(100.0 * len(filtered) / len(df), 2)
Pandas require filtering, counting and manual distribution to calculate the percentage.
Dick DB here is clean and fast. It minimizes the number of steps and avoids repeated code.
Coast Sizing for Uber Data Project
- We are now on the last question of the scene 1. In this question, Uber wants us to detect drivers who cannot get some tasks, such as tours and acceptance rates, yet there is a higher rank, especially the driver.
- Completed less than 10 periods
- The acceptance rate was less than 90
Rating was more than 4.7
SELECT
ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM data), 2) AS percentage
FROM data
WHERE "Trips Completed" < 10
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) = 4.7
Now, these are three separate filters, and we want to calculate the percentage of drivers who satisfy each of them. Let’s see the inquiry.
Here is an output.
Coast Sizing in Dick DB
Here, we filtered the row where all three conditions were satisfied, counted, and divided them from the total number of drivers drivers.
5. Basic math questions for revenue modeling
Now, we say you want to do some basic math. You can write a direct expression in your chosen statement.
Dick DB vs. Pandas Implementation of Mathematics
SELECT
daily_income * work_days * weeks_per_year AS annual_revenue,
weekly_cost * weeks_per_year AS total_cost,
(daily_income * work_days * weeks_per_year) - (weekly_cost * weeks_per_year) AS net_income
FROM data
Dick DB allows mathematics to write directly like a calculator in the selected clause.
daily_income = 200
weeks_per_year = 49
work_days = 6
weekly_cost = 500
annual_revenue = daily_income * work_days * weeks_per_year
total_cost = weekly_cost * weeks_per_year
net_income = annual_revenue - total_cost
Pandas require a number of intermediate calculations in the separate variables.
Dick DB makes it easy for the SQL block to read mathematics logic, while Pandas becomes slightly disrupted with variable assignments.
Basic math in Uber Data Project
In scenario 2, Uber asked us to calculate how much money the driver (after costs) makes every year without contributing with Uber. There are some costs such as gas, rent and insurance.
Basic math in Dick DB
SELECT
200 * 6 * (52 - 3) AS annual_revenue,
200 * (52 - 3) AS gas_expense,
500 * (52 - 3) AS rent_expense,
400 * 12 AS insurance_expense,
(200 * 6 * (52 - 3))
- (200 * (52 - 3) + 500 * (52 - 3) + 400 * 12) AS net_income
Now let’s calculate the annual revenue and delete the costs.
Here is an output.
Basic math in Dick DB
With Dick DB, you can write it like a SQL Matrix block. You don’t need Pandas data frames or manual looping!
6. Conditional calculations for dynamic spending plan
What if your cost structure changes on the basis of certain conditions?
Implement the Conditional Calculation in Dick DB vs Pandas
SELECT
original_cost * 1.05 AS increased_cost,
original_cost * 0.8 AS discounted_cost,
0 AS removed_cost,
(original_cost * 1.05 + original_cost * 0.8) AS total_new_cost
Dick DB allows you to apply conditional logic using mathematical adjustments within your query.
weeks_worked = 49
gas = 200
insurance = 400
gas_expense = gas * 1.05 * weeks_worked
insurance_expense = insurance * 0.8 * 12
rent_expense = 0
total = gas_expense + insurance_expense
Pandas uses the same logic with manual updates in multiple mathematical lines and variables.
Dick DB has diverted a multi -faceted logic in the same SQL expression in Pandas.
Conditional Calculation in Uber Data Project
- In this scenario, we now create a model what happens if the driver buys a partner and car with Uber. Costs change the same way
- The cost of gas increases by 5 %
- Insurance decreases by 20 %
con.execute("""
SELECT
200 * 1.05 * 49 AS gas_expense,
400 * 0.8 * 12 AS insurance_expense,
0 AS rent_expense,
(200 * 1.05 * 49) + (400 * 0.8 * 12) AS total_expense
""").fetchdf()
No more cost of rent
Here is an output.
Conditional calculation in Dick DB
7. Round -driven math to target revenue
Sometimes, your analysis can be driven by a business purpose, such as hitting the target of tariffs or covering a time cost.
Dick DB vs Pandas Round Mathematics Implementation Dick DB handles multi -dimensional logicCTES
WITH vars AS (
SELECT base_income, cost_1, cost_2, target_item
),
calc AS (
SELECT
base_income - (cost_1 + cost_2) AS current_profit,
cost_1 * 1.1 + cost_2 * 0.8 + target_item AS new_total_expense
FROM vars
),
final AS (
SELECT
current_profit + new_total_expense AS required_revenue,
required_revenue / 49 AS required_weekly_income
FROM calc
)
SELECT required_weekly_income FROM final
. This queries make the modular and easy to read.
weeks = 49
original_income = 200 * 6 * weeks
original_cost = (200 + 500) * weeks + 400 * 12
net_income = original_income - original_cost
# new expenses + car cost
new_gas = 200 * 1.05 * weeks
new_insurance = 400 * 0.8 * 12
car_cost = 40000
required_revenue = net_income + new_gas + new_insurance + car_cost
required_weekly_income = required_revenue / weeks
Pandas require calculations to avoid duplication and reuse the first variables.
Dick DB allows you to build a logic pipeline step by step, without cluttering your notebook with scattered code.
Round -driven math in Uber Data Project
Now when we have modeling new spending, let’s answer the last business question:
- How much does the driver need to earn each week to do?
- Within a year. Pay 40.000 car
Maintain the same annual net income
WITH vars AS (
SELECT
52 AS total_weeks_per_year,
3 AS weeks_off,
6 AS days_per_week,
200 AS fare_per_day,
400 AS monthly_insurance,
200 AS gas_per_week,
500 AS vehicle_rent,
40000 AS car_cost
),
base AS (
SELECT
total_weeks_per_year,
weeks_off,
days_per_week,
fare_per_day,
monthly_insurance,
gas_per_week,
vehicle_rent,
car_cost,
total_weeks_per_year - weeks_off AS weeks_worked,
(fare_per_day * days_per_week * (total_weeks_per_year - weeks_off)) AS original_annual_revenue,
(gas_per_week * (total_weeks_per_year - weeks_off)) AS original_gas,
(vehicle_rent * (total_weeks_per_year - weeks_off)) AS original_rent,
(monthly_insurance * 12) AS original_insurance
FROM vars
),
compare AS (
SELECT *,
(original_gas + original_rent + original_insurance) AS original_total_expense,
(original_annual_revenue - (original_gas + original_rent + original_insurance)) AS original_net_income
FROM base
),
new_costs AS (
SELECT *,
gas_per_week * 1.05 * weeks_worked AS new_gas,
monthly_insurance * 0.8 * 12 AS new_insurance
FROM compare
),
final AS (
SELECT *,
new_gas + new_insurance + car_cost AS new_total_expense,
original_net_income + new_gas + new_insurance + car_cost AS required_revenue,
required_revenue / weeks_worked AS required_weekly_revenue,
original_annual_revenue / weeks_worked AS original_weekly_revenue
FROM new_costs
)
SELECT
ROUND(required_weekly_revenue, 2) AS required_weekly_revenue,
ROUND(required_weekly_revenue - original_weekly_revenue, 2) AS weekly_uplift
FROM final
Now let’s write the code representing this logic.
Here is an output.
Round -driven math in Dick DB
The final views
In this article, we found a way to connect with the detection and analyze data. Instead of using long pandas functions, we used SQL questions. We also did this using a real -life data project that Uber applied for the data scientist in the process of recruiting.
For data scientists working on heavy tasks, this is a light but powerful alternative to pandas. Try to use it on your next project, especially when SQL logic fit the problem better.
A data is in a scientist and product strategy. He is also an affiliated professor of Teaching Analytics, and is the founder of Stratskrich, a platform that helps data scientists prepare for his interview with the real questions of high companies. The net carrier writes on the latest trends in the market, gives interview advice, sharing data science projects, and everything covers SQL.