Identifying Users with Purchase Frequency Spikes

SQL coding challenge · Difficulty: hard · +200 XP

Table: purchases

+---------------+---------+

| Column        | Type    |

+---------------+---------+

| purchase_id   | INT     |
| user_id       | INT     |
| purchase_date | DATE    |

+---------------+---------+

Problem

-------

Find users whose purchase count in their

MOST RECENT month is AT LEAST DOUBLE the

average count of their previous 2 months.

Only consider users with purchases in at

least 3 distinct months.

Return: user_id, increase_period (YYYY-MM)

Order: user_id ASC

Step-by-Step Approach

---------------------

1. Group purchases by user_id + YYYY-MM

2. Rank months per user (latest = rank 1)

3. Keep only top 3 months per user

4. Compare: rank=1 count >= 2 × avg(rank 2,3)

Example Input

-------------

+----+---------+---------------+

| id | user_id | purchase_date |

+----+---------+---------------+

|  1 |   101   | 2024-01-05    |
|  2 |   101   | 2024-01-10    |← Jan: 2
|  3 |   101   | 2024-02-10    |← Feb: 2
|  4 |   101   | 2024-02-22    |
|  5 |   101   | 2024-03-01    |
|  6 |   101   | 2024-03-08    |← Mar: 6
|  7 |   101   | 2024-03-12    |
|  8 |   101   | 2024-03-19    |
|  9 |   101   | 2024-03-24    |
| 10 |   101   | 2024-03-28    |

+----+---------+---------------+

Calculation for user 101:

Latest (Mar): 6 purchases

Prev 2 avg: (2 + 2) / 2 = 2.0

6 >= 2 × 2.0? YES → SPIKE!

Expected Output

+---------+-----------------+

| user_id | increase_period |

+---------+-----------------+

|   101   |    2024-03      |

+---------+-----------------+

Hint

----

WITH monthly AS (

SELECT user_id,

DATE_FORMAT(purchase_date,'%Y-%m') AS period,

COUNT(*) AS cnt

FROM purchases

GROUP BY user_id, period

)

Solve this challenge on PySpark.in