Current and Previous Address Tracking with SCD Type 2

SQL coding challenge · Difficulty: medium · +100 XP

Table: address_history

+--------------+---------+

| Column       | Type    |

+--------------+---------+

| customer_id  | INT     |
| address      | VARCHAR |
| start_date   | DATE    |
| end_date     | DATE    |

+--------------+---------+

• end_date = NULL means current address

• Each customer can have multiple rows

(one per address change)

Problem

-------

This is a Slowly Changing Dimension Type 2

table. For each customer, find:

• current_address (where end_date IS NULL)

• previous_address (the row just before

the current one by start_date)

Return: customer_id, current_address,

previous_address

(NULL if no previous address)

Order: customer_id ASC

Example Input

-------------

+------+-----------+-----------+-----------+

| c_id | address   | start     | end       |

+------+-----------+-----------+-----------+

|  101 | Mumbai    | 2020-01-01| 2022-06-30|
|  101 | Delhi     | 2022-07-01| 2023-12-31|
|  101 | Bangalore | 2024-01-01|   NULL    |
|  102 | Chennai   | 2021-03-15|   NULL    |

+------+-----------+-----------+-----------+

Expected Output

+------+-----------+----------+

| c_id | current   | previous |

+------+-----------+----------+

|  101 | Bangalore | Delhi    |
|  102 | Chennai   | NULL     |

+------+-----------+----------+

Hint

----

LAG(address) OVER (

PARTITION BY customer_id

ORDER BY start_date

)

Solve this challenge on PySpark.in