Top SQL Patterns of Fang Data Science Interview (with Code)

Photo by author

# Introduction

Technical screening for data science roles at Fang companies is very good. However, even they can’t handle the endless stream of unique interview questions. Once you’ve gone through enough times, you start to notice that some SQL patterns keep showing up.

Here are the top 5, with examples and code (Postgres QL) for practice.

Photo by author Napkins

Master these and you will be ready for most SQL interviews.

# Pattern #1: Aggregating data with groups

Using aggregate functions GROUP BY Allows you to aggregate metrics by category.

This pattern is often combined with data filtering, which means using one of two clauses:

WHERE: Filters data before aggregation.
HAVING: Filters the data after aggregation.

Example: This A meta-interview question Asks you to find the total number of comments per user 30 days or less before 2020-02-10. Users with no comments should be excluded from the output.

We use SUM() Work with a GROUP BY Clause according to the number of comments per user. Outputting only comments within a specified time period is achieved by filtering the data prior to collection, i.e., using WHERE. There is no need to calculate that “30 days before 2020-02-10” is the date. We subtract 30 days from this date using only this INTERVAL History event.

SELECT user_id,
       SUM(number_of_comments) AS number_of_comments
FROM fb_comments_count
WHERE created_at BETWEEN '2020-02-10'::DATE - 30 * INTERVAL '1 day' AND '2020-02-10'::DATE
GROUP BY user_id;

Here is the output.

user_id	No_Of_Comment
5	1
8	4
9	2
8 speaks of the day about living	8 speaks of the day about living
99	2

Business Use:

User Activity Metrics: Dow and Maofor , for , for , . Manor rate.
Revenue Measurement: Per Region/Product/Time Period.
User engagement: average session length, average clicks per user.

# Pattern #2: Filtering with subqueries

When using subqueries for filtering, you create a subset of data, then filter the main query against it.

The two main subtypes are:

Scalar Substrings: Return a single value, eg, the maximum amount.
Associative Substrings: Rely on the result of an external query to return references and values.

Example: This Interview question from Meta Asks you to build a recommendation system for Facebook. For each user, you find pages that are not followed, but at least one of their friends is. The output should contain the ID of the user and the ID of the page that should be recommended to that user.

The outer query returns all pairs of user pages where the page is followed by at least one friend.

Then, we use a subscript in WHERE Clause to remove pages that the user already follows. The subquery has two conditions: one that only considers pages after that particular user (checks only for that user), and then checks if the page considered for suggestion is included in that user’s post (checks for that page only).

Since using subkey returns all subsequent pages to the user NOT EXISTS i WHERE Excludes all these pages from recommendation.

SELECT DISTINCT f.user_id,
                p.page_id
FROM users_friends f
JOIN users_pages p ON f.friend_id = p.user_id
WHERE NOT EXISTS
    (SELECT *
     FROM users_pages pg
     WHERE pg.user_id = f.user_id
       AND pg.page_id = p.page_id);

Here is the output.

user_id	Page_id
1	23
1	24
1	28
8 speaks of the day about living	8 speaks of the day about living
5	25

Business Use:

Customer activity: per user, latest login, latest subscription change.
Sales: Highest order per customer, highest revenue order per region.
Product Performance: The most purchased products in each category, the highest revenue products in each month.
User behavior: longest session per user, first purchase per customer.
Reviews and opinions: Top reviewer, latest reviews for every product.
Operations: Latest shipment status per order, fastest delivery time in every region.

# Pattern #3: Hierarchy with window functions

Using window functions such as ROW_NUMBER()for , for , for , . RANK()and DENSE_RANK() Allows you to order rows within data partitions, and then identifies first, second, or Ninth record.

Here is a window with a rating of each of them Functions does:

ROW_NUMBER(): Assigns a unique sequence number within each partition. Tied values get different row numbers.
RANK(): Assigns the same rank to the bound values and skips the next rows for the next unbound value.
DENSE_RANK(): The same RANK()only it doesn’t leave rank after the relationship.

Example: In a Amazon Interview Questionwe need to find the highest daily order cost between 2019-02-01 and 2019-05-01. If a customer has multiple orders on a particular day, sum up the order costs on a daily basis. The output should contain the customer’s first name, the total cost of their order, and the order date.

In the first joint table expression (CTE), we find the orders between the specified dates and get the customer’s daily total for each date.

In other CTEs, we use RANK() Sorting customers for each date by order.

Now, we join the two CTEs to output the desired columns and filter only the orders to which they are assigned, i.e. the highest order.

WITH customer_daily_totals AS (
  SELECT o.cust_id,
         o.order_date,
         SUM(o.total_order_cost) AS total_daily_cost
  FROM orders o
  WHERE o.order_date BETWEEN '2019-02-01' AND '2019-05-01'
  GROUP BY o.cust_id, o.order_date
),

ranked_daily_totals AS (
  SELECT cust_id,
         order_date,
         total_daily_cost,
         RANK() OVER (PARTITION BY order_date ORDER BY total_daily_cost DESC) AS rnk
  FROM customer_daily_totals
)

SELECT c.first_name,
       rdt.order_date,
       rdt.total_daily_cost AS max_cost
FROM ranked_daily_totals rdt
JOIN customers c ON rdt.cust_id = c.id
WHERE rdt.rnk = 1
ORDER BY rdt.order_date;

Here is the output.

first_name	order_date	max_coast
Mia	2019-02-01	100
Frieda	2019-03-01	80
Mia	2019-03-01	80
8 speaks of the day about living	8 speaks of the day about living	8 speaks of the day about living
Frieda	2019-04-23	120

Business Use:

User Activity: “Most active users last month”.
Revenue: “Second highest revenue generating region”.
Product Popularity: “Top 10 Best Selling Products”.
Buys “every customer’s first purchase”.

# Pattern #4: Calculating Moving Averages and Cumulative Amounts

A moving (rolling) average calculates the average n Rows, usually months or days. It is calculated using AVG() As the window function and window definition ROWS BETWEEN N PRECEDING AND CURRENT ROW.

The grand total (running total) is the sum from the first row to the current row, which is reflected in the defining window. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW i SUM() Window function

Example: Amazon Interview Question We want to find the 3-month rolling average of total revenue from purchases. We should output the yearly month (YYYY-MM) and the 3-month rolling average, sorted from earliest to latest month.

Also, returns (negative purchase values) should not be included.

We use a subset to calculate monthly revenue SUM() And change the purchase date to YYYY-MM format TO_CHAR() The ceremony

Then, we use AVG() To calculate the moving average. i OVER() clause, we order the data in the distribution by month and specify a window ROWS BETWEEN 2 PRECEDING AND CURRENT ROW; We calculate a 3-month moving average, which takes into account the current and previous two months.

SELECT t.month,
       AVG(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_revenue
FROM
  (SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
          SUM(purchase_amt) AS monthly_revenue
   FROM amazon_purchases
   WHERE purchase_amt > 0
   GROUP BY 1
   ORDER BY 1) AS t
ORDER BY t.month ASC;

Here is the output.

The month	avg_revence
2020-01	26292
2020-02	23493.5
2020-03	25535.666666666668
8 speaks of the day about living	8 speaks of the day about living
2020-10	21211

To calculate the total amount, we do it like this.

SELECT t.month,
       SUM(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_sum
FROM
  (SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
          SUM(purchase_amt) AS monthly_revenue
   FROM amazon_purchases
   WHERE purchase_amt > 0
   GROUP BY 1
   ORDER BY 1) AS t
ORDER BY t.month ASC;

Here is the output.

The month	Low_Sim
2020-01	26292
2020-02	46987
2020-03	76607
8 speaks of the day about living	8 speaks of the day about living
2020-10	239869

Business Use:

Engagement metrics: 7-day moving average DAU or messages sent, cumulative cancellations.
Financial KPI: 30-day average price/conversion/stock prices, reporting revenue (gross YTD).
Product Performance: Average logins per user, total app installs.
Operations: Aggregate orders shipped, tickets resolved, bugs closed.

# Pattern #5: Applying Conditional Aggregations

Conditional aggregation lets you compute multiple class matrices in a single pass Matter when statement Within the overall functions.

Example: a Amazon Interview Questions Asks you to identify active users by finding users who make a second purchase within 1 to 7 days of their first purchase. The output should contain only those user IDs. Same day purchases should be ignored.

The first CTE identifies customers and their purchase dates, excluding same-day purchases using DISTINCT Keyword

The second CTE is each customer’s purchase history from oldest to newest.

The final CTE finds the first and second purchases for each customer using conditional aggregation. We use MAX() To select a single non-null value for the first and second purchase dates.

Finally, we use the result of the last CTE and retain only those customers who made a second (non-null) purchase within 7 days of their first purchase.

WITH daily AS (
  SELECT DISTINCT user_id,
         created_at::DATE AS purchase_date
  FROM amazon_transactions
),

ranked AS (
  SELECT user_id,
         purchase_date,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY purchase_date) AS rn
  FROM daily
),

first_two AS (
  SELECT user_id,
         MAX(CASE WHEN rn = 1 THEN purchase_date END) AS first_date,
         MAX(CASE WHEN rn = 2 THEN purchase_date END) AS second_date
  FROM ranked
  WHERE rn <= 2
  GROUP BY user_id
)

SELECT user_id
FROM first_two
WHERE second_date IS NOT NULL AND (second_date - first_date) BETWEEN 1 AND 7
ORDER BY user_id;

Here is the output.

user_id
100
103
105
8 speaks of the day about living
143

Business Use:

Subscription reporting: paid vs. free users, active vs. active, by plan tier.
Marketing Funnel Dashboards: Signups vs. Purchasers by traffic source, emails opened vs. clicked vs. converted.
E-Commerce: Completed vs. Refunds vs. Canceled Orders by Territory, New vs. Returning Buyers
Product Analysis: iOS vs. Android vs. Web Usage, Feature Adopted vs. Not Included in Each Group
Finance: Revenue from new vs. existing customers, gross vs. net revenue.
A/B Testing and Experiments: Control vs. Treatment Matrix.

# The result

If you want a job at Feng (and other) companies, pay attention to these five SQL patterns for interviews. Of course, those aren’t the only SQL concepts that have been tested. But they are tested the most. By focusing on these, you ensure that your interview preparation for most SQL interviews at fun companies is as effective as possible.

Nate Rosedy A data scientist and product strategist. He is also an adjunct professor teaching analytics, and the founder of StrataScratch, a platform that helps data scientists prepare for their interviews with real interview questions from top companies. Netcareer writes on the latest trends in the market, gives interview tips, shares data science projects, and covers everything SQL.

# Introduction

# Pattern #1: Aggregating data with groups

# Pattern #2: Filtering with subqueries

# Pattern #3: Hierarchy with window functions

# Pattern #4: Calculating Moving Averages and Cumulative Amounts

# Pattern #5: Applying Conditional Aggregations

# The result

Editor's pick

Get latest news

Top SQL Patterns of Fang Data Science Interview (with Code)

# Introduction

# Pattern #1: Aggregating data with groups

# Pattern #2: Filtering with subqueries

# Pattern #3: Hierarchy with window functions

# Pattern #4: Calculating Moving Averages and Cumulative Amounts

# Pattern #5: Applying Conditional Aggregations

# The result

When Pixels Meet Paper: A Return to Concrete Art in the AI ​​Age | By jm bunthous | November, 2025

Scalps’ New AI Infra Product Reduces GPU Costs by 50% for Self-Hosted Enterprise LLM for Early Adopters

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news

When Pixels Meet Paper: A Return to Concrete Art in the AI Age | By jm bunthous | November, 2025