7 Dick DBS QL questions that save you Pandas Hours work

Photo by Author | Canva

Pandas is a rapidly growing community in the library. This popularity has opened the door for alternatives Polar. In this article, we will find a similar alternative, Dick DB.

Dick DB is an SQL database that you can run in your notebook. No setup is needed, and no server is required. It is easy to install and can work with pandas parallel.

Unlike other SQL database, you do not need to create a server. It only works with your notebook after installation. This means that there is no local setup headache, you are writing the code immediately. Dick DB handles filtering, Is addedAnd Deposit Clean SQL syntax, compared to pandas, and performs significantly better on large datases.

So with the terms, let’s start!

Data Project – Uber Business Modeling

We will use it with a Gapter notebook, connecting it to the data analysis. To make things more interesting, we will work on a real -life data project. Let’s start!

Example of Data Project for Dickdb SQL questions

Here is the link We will use this article for the data project. This is a Uber data project called Partner Business Modeling.

Uber used this data project in the process of recruiting data science positions, and you will be asked to analyze data for two different scenarios.

Scene 1: Compare the cost of two bonus programs designed to get more drivers online during the busy day.
Scene 2: Calculate and compare the annual net income of the traditional taxi driver vs. One who contributes to Uber and buys a car.

Dataset is loading

Let’s load the data frame first. This move will be needed. Therefore, we will register this datastate with dickdb in the following parts.

import pandas as pd
df = pd.read_csv("dataset_2.csv")

Dataset Search

Here are the first few rows:

Let’s see all the columns.

Here is an output.

Connect Dick DB and register data frame

Well, this is really a straightforward data, but how can we connect Dickdb to this datastate?
First, if you haven’t installed it yet, install Dick DB.

It’s easy to connect with Dick DB. Nos, if you want to read the documents, check it out Here.

Now, there is a code to make a connection and register the data frame.

import duckdb
con = duckdb.connect()

con.register("my_data", df)

Well, let’s start searching for seven questions that will save you the work of pandas!

1. To filter multi -standards for complex qualification rules

One of the most important benefits of SQL is that it naturally handles filtering, especially multi -conditioning filtering, very easily.

Implementation of multi -faceted filtering in Dick DB vs Pandas

Dick DB allows you to apply multiple filters using SQL where clauses and logic, which also increase the number of filters as well as scales.

SELECT 
    *
FROM data
WHERE condition_1
  AND condition_2
  AND condition_3
  AND condition_4

Now let’s see how we will write the same logic in the Pandas. In pandas, small logic is expressed using brackets with boulin masks, which can be obtained under many terms.

filtered_df = df(
    (df("condition_1")) &
    (df("condition_2")) &
    (df("condition_3")) &
    (df("condition_4"))
)

Both methods are equally applied to readable and basic use. Dick DB feels more natural and clean because the logic becomes more complicated.

Multi -quality filtering for Uber Data Project

In this case, we want to find drivers who qualify for a specific Uber bonus program.

According to the rules, drivers must:

Stay at least 8 hours online
Complete at least 10 visits
Accept at least 90 % ride requests
4.7 or above has a rating

Now we have to write only one question that filters it all. Here is a code.

SELECT 
    COUN(*) AS qualified_drivers,
    COUNT(*) * 50 AS total_payout
FROM data
WHERE "Supply Hours" >= 8
  AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
  AND "Trips Completed" >= 10
  AND Rating >= 4.7

But we need to add this code with Azgar to wear, we need to add con.execute (“” “” “”) and Fatch DF () methods as shown below:

con.execute("""
SELECT 
    COUNT(*) AS qualified_drivers,
    COUNT(*) * 50 AS total_payout
FROM data
WHERE "Supply Hours" >= 8
  AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
  AND "Trips Completed" >= 10
  AND Rating >= 4.7
""").fetchdf()

We will do this in the whole article. Now when you know how to run it in a Gapter Notebook, we will only show the SQL code from now on, and you will know how to convert it into a pitoonic version.
Well now, remember that the data project wants us to calculate the total payment for Option 1.

We have calculated the driver’s amount, but we should raise it to $ 50, as the payment for each driver will be $ 50, so we will do it with counting.
* 50.

Here is an output.

Filtering multi -tittia

2. Fast collecting to estimate business concessions

SQL is great to collect faster, especially when you need to summarize data in rows.

Implementation of Dick DB vs. Pandas Enforcement

SELECT 
    COUNT(*) AS num_rows,
    SUM(column_name) AS total_value
FROM data
WHERE some_condition

Dick DB allows you to use overall values in the row using SQL functions such as SUM and counting in a compact block.

filtered = df(df("some_condition"))
num_rows = filtered.shape(0)
total_value = filtered("column_name").sum()

In the pandas, you need to filter the data frame first, then count separately using chains methods and money.

Dick DB is more comprehensive and easy to read, and does not need to manage intermediate variables.

Deposit in Uber Data Project

Well, let’s go ahead with the second bonus scheme, option 2. According to the project description, drivers will get $ 4 per trip if:
They complete at least 12 visits.

4.7 or better is the rating.

SELECT 
    COUNT(*) AS qualified_drivers,
    SUM("Trips Completed") * 4 AS total_payout
FROM data
WHERE "Trips Completed" >= 12
  AND Rating >= 4.7

This time, instead of counting the drivers, we need to add the number of their full journey since they are paid bonuses at every trip, not everyone.

The count here tells us how many drivers are eligible. However, we will calculate the total payment, we will calculate their visits and multiply $ 4, as needed by Option 2.

Collect in Dick DB

Here is an output.

Collect in Dick DB

With Dick DB, we do not need to loop rows or collect custom. The money function takes care of everything we need.

3. Explore overlaps and differences using Bolin Logic

In the SQL, you can easily combine the situation, such as the Bowlin logic, or, and, not, and not, and not.

Dick DB vs Bolian Logic Implementation in Pandas

SELECT *
FROM data
WHERE condition_a
  AND condition_b
  AND NOT (condition_c)

Dick DB Black Logic locally where the clause uses and, or, not.

filtered = df(
    (df("condition_a")) &
    (df("condition_b")) &
    ~(df("condition_c"))
)

Pandas needs a combination of logical operators with masks and brackets, including the use of “~” for negatives.

Although both are active, Dick DB is easy to argue when logic contains emission or nesting terms.

Bowlin logic for Uber Data Project

Now we have calculated Option 1 and Option 2, what comes forward? The time has come to compare. Remember our next question.

Bowlin logic in Dick DB

SELECT COUNT(*) AS only_option1
FROM data
WHERE "Supply Hours" >= 8
  AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
  AND "Trips Completed" >= 10
  AND Rating >= 4.7
  AND NOT ("Trips Completed" >= 12 AND Rating >= 4.7)

This is the place where we can use the Bowel Logic. We will use a combination and not.

Here is an output.

Bowlin logic in Dick DB

Let’s break it:
Here are the first four terms Option 1.

No (..) Part is used to exclude drivers who are also eligible for option 2.

It’s perfectly straight, okay?

4. Instant cohort with conditional filters

Sometimes, you want to understand how big a specific group or harmony is in your data.

Implementation of Conditional Filters Conditional Filters in Dick DB vs Pandas?

SELECT 
  ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM data), 2) AS percentage
FROM data
WHERE condition_1
  AND condition_2
  AND condition_3

The Dick DB handles co -filtering and percentage calculations with an SQL inquiry, even includes sub -reserves.

filtered = df(
    (df("condition_1")) &
    (df("condition_2")) &
    (df("condition_3"))
)
percentage = round(100.0 * len(filtered) / len(df), 2)

Pandas require filtering, counting and manual distribution to calculate the percentage.

Dick DB here is clean and fast. It minimizes the number of steps and avoids repeated code.

Coast Sizing for Uber Data Project

We are now on the last question of the scene 1. In this question, Uber wants us to detect drivers who cannot get some tasks, such as tours and acceptance rates, yet there is a higher rank, especially the driver.
Completed less than 10 periods
The acceptance rate was less than 90

Rating was more than 4.7

SELECT 
  ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM data), 2) AS percentage
FROM data
WHERE "Trips Completed" < 10
  AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) = 4.7

Now, these are three separate filters, and we want to calculate the percentage of drivers who satisfy each of them. Let’s see the inquiry.

Here is an output.

Coast Sizing in Dick DB

Here, we filtered the row where all three conditions were satisfied, counted, and divided them from the total number of drivers drivers.

5. Basic math questions for revenue modeling

Now, we say you want to do some basic math. You can write a direct expression in your chosen statement.

Dick DB vs. Pandas Implementation of Mathematics

SELECT 
    daily_income * work_days * weeks_per_year AS annual_revenue,
    weekly_cost * weeks_per_year AS total_cost,
    (daily_income * work_days * weeks_per_year) - (weekly_cost * weeks_per_year) AS net_income
FROM data

Dick DB allows mathematics to write directly like a calculator in the selected clause.

daily_income = 200
weeks_per_year = 49
work_days = 6
weekly_cost = 500

annual_revenue = daily_income * work_days * weeks_per_year
total_cost = weekly_cost * weeks_per_year
net_income = annual_revenue - total_cost

Pandas require a number of intermediate calculations in the separate variables.

Dick DB makes it easy for the SQL block to read mathematics logic, while Pandas becomes slightly disrupted with variable assignments.

Basic math in Uber Data Project

In scenario 2, Uber asked us to calculate how much money the driver (after costs) makes every year without contributing with Uber. There are some costs such as gas, rent and insurance.

Basic math in Dick DB

SELECT 
    200 * 6 * (52 - 3) AS annual_revenue,
    200 * (52 - 3) AS gas_expense,
    500 * (52 - 3) AS rent_expense,
    400 * 12 AS insurance_expense,
    (200 * 6 * (52 - 3)) 
      - (200 * (52 - 3) + 500 * (52 - 3) + 400 * 12) AS net_income

Now let’s calculate the annual revenue and delete the costs.

Here is an output.

Basic math in Dick DB

With Dick DB, you can write it like a SQL Matrix block. You don’t need Pandas data frames or manual looping!

6. Conditional calculations for dynamic spending plan

What if your cost structure changes on the basis of certain conditions?

Implement the Conditional Calculation in Dick DB vs Pandas

SELECT 
    original_cost * 1.05 AS increased_cost,
    original_cost * 0.8 AS discounted_cost,
    0 AS removed_cost,
    (original_cost * 1.05 + original_cost * 0.8) AS total_new_cost

Dick DB allows you to apply conditional logic using mathematical adjustments within your query.

weeks_worked = 49
gas = 200
insurance = 400

gas_expense = gas * 1.05 * weeks_worked
insurance_expense = insurance * 0.8 * 12
rent_expense = 0
total = gas_expense + insurance_expense

Pandas uses the same logic with manual updates in multiple mathematical lines and variables.

Dick DB has diverted a multi -faceted logic in the same SQL expression in Pandas.

Conditional Calculation in Uber Data Project

In this scenario, we now create a model what happens if the driver buys a partner and car with Uber. Costs change the same way
The cost of gas increases by 5 %
Insurance decreases by 20 %

con.execute("""
SELECT 
    200 * 1.05 * 49 AS gas_expense,
    400 * 0.8 * 12 AS insurance_expense,
    0 AS rent_expense,
    (200 * 1.05 * 49) + (400 * 0.8 * 12) AS total_expense
""").fetchdf()

No more cost of rent

Here is an output.

Conditional calculation in Dick DB

7. Round -driven math to target revenue

Sometimes, your analysis can be driven by a business purpose, such as hitting the target of tariffs or covering a time cost.

Dick DB vs Pandas Round Mathematics Implementation Dick DB handles multi -dimensional logicCTES

WITH vars AS (
  SELECT base_income, cost_1, cost_2, target_item
),
calc AS (
  SELECT 
    base_income - (cost_1 + cost_2) AS current_profit,
    cost_1 * 1.1 + cost_2 * 0.8 + target_item AS new_total_expense
  FROM vars
),
final AS (
  SELECT 
    current_profit + new_total_expense AS required_revenue,
    required_revenue / 49 AS required_weekly_income
  FROM calc
)
SELECT required_weekly_income FROM final

. This queries make the modular and easy to read.

weeks = 49
original_income = 200 * 6 * weeks
original_cost = (200 + 500) * weeks + 400 * 12
net_income = original_income - original_cost

# new expenses + car cost
new_gas = 200 * 1.05 * weeks
new_insurance = 400 * 0.8 * 12
car_cost = 40000

required_revenue = net_income + new_gas + new_insurance + car_cost
required_weekly_income = required_revenue / weeks

Pandas require calculations to avoid duplication and reuse the first variables.

Dick DB allows you to build a logic pipeline step by step, without cluttering your notebook with scattered code.

Round -driven math in Uber Data Project

Now when we have modeling new spending, let’s answer the last business question:

How much does the driver need to earn each week to do?
Within a year. Pay 40.000 car

Maintain the same annual net income

WITH vars AS (
  SELECT 
    52 AS total_weeks_per_year,
    3 AS weeks_off,
    6 AS days_per_week,
    200 AS fare_per_day,
    400 AS monthly_insurance,
    200 AS gas_per_week,
    500 AS vehicle_rent,
    40000 AS car_cost
),
base AS (
  SELECT 
    total_weeks_per_year,
    weeks_off,
    days_per_week,
    fare_per_day,
    monthly_insurance,
    gas_per_week,
    vehicle_rent,
    car_cost,
    total_weeks_per_year - weeks_off AS weeks_worked,
    (fare_per_day * days_per_week * (total_weeks_per_year - weeks_off)) AS original_annual_revenue,
    (gas_per_week * (total_weeks_per_year - weeks_off)) AS original_gas,
    (vehicle_rent * (total_weeks_per_year - weeks_off)) AS original_rent,
    (monthly_insurance * 12) AS original_insurance
  FROM vars
),
compare AS (
  SELECT *,
    (original_gas + original_rent + original_insurance) AS original_total_expense,
    (original_annual_revenue - (original_gas + original_rent + original_insurance)) AS original_net_income
  FROM base
),
new_costs AS (
  SELECT *,
    gas_per_week * 1.05 * weeks_worked AS new_gas,
    monthly_insurance * 0.8 * 12 AS new_insurance
  FROM compare
),
final AS (
  SELECT *,
    new_gas + new_insurance + car_cost AS new_total_expense,
    original_net_income + new_gas + new_insurance + car_cost AS required_revenue,
    required_revenue / weeks_worked AS required_weekly_revenue,
    original_annual_revenue / weeks_worked AS original_weekly_revenue
  FROM new_costs
)
SELECT 
  ROUND(required_weekly_revenue, 2) AS required_weekly_revenue,
  ROUND(required_weekly_revenue - original_weekly_revenue, 2) AS weekly_uplift
FROM final

Now let’s write the code representing this logic.

Here is an output.

Round -driven math in Dick DB

The final views
In this article, we found a way to connect with the detection and analyze data. Instead of using long pandas functions, we used SQL questions. We also did this using a real -life data project that Uber applied for the data scientist in the process of recruiting.
For data scientists working on heavy tasks, this is a light but powerful alternative to pandas. Try to use it on your next project, especially when SQL logic fit the problem better.

Net Razii

A data is in a scientist and product strategy. He is also an affiliated professor of Teaching Analytics, and is the founder of Stratskrich, a platform that helps data scientists prepare for his interview with the real questions of high companies. The net carrier writes on the latest trends in the market, gives interview advice, sharing data science projects, and everything covers SQL.

Data Project – Uber Business Modeling

Dataset is loading

Dataset Search

Connect Dick DB and register data frame

1. To filter multi -standards for complex qualification rules

Implementation of multi -faceted filtering in Dick DB vs Pandas

Multi -quality filtering for Uber Data Project

Editor's pick

Get latest news

7 Dick DBS QL questions that save you Pandas Hours work

Data Project – Uber Business Modeling

Dataset is loading

Dataset Search

Connect Dick DB and register data frame

1. To filter multi -standards for complex qualification rules

Implementation of multi -faceted filtering in Dick DB vs Pandas

Multi -quality filtering for Uber Data Project

Amazon’s best prime day robots are vacuum deals that you can get for 2025 right now

Why did this RN leave the hospital to start their business?

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news