ROW_NUMBER: Remove Duplicate Rows and Keep One Per Order

SQL coding challenge · Difficulty: easy · +50 XP

Problem

A data pipeline bug created duplicate rows in the orders table. Clean it by keeping exactly one row per duplicate group.

Tables

Table: orders_dup

| order_id | customer_name | amount |
| 1 | Krishna | 500 |
| 1 | Krishna | 500 |
| 2 | Parvati | 800 |
| 2 | Parvati | 800 |
| 3 | Atul | 300 |

Expected Output (deduplicated)

| order_id | customer_name | amount |
| 1 | Krishna | 500 |
| 2 | Parvati | 800 |
| 3 | Atul | 300 |
  • Return: order_id, customer_name, amount
  • Keep only one row per duplicate group
  • Sort by order_id ascending
  • Hint: Use ROW_NUMBER() OVER (PARTITION BY order_id, customer_name, amount ORDER BY order_id) then filter where rn = 1

Solve this challenge on PySpark.in