Extract Error Codes from Log Strings

PYTHON coding challenge · Difficulty: medium · +100 XP

Function Signature

------------------

def extract_errors(logs: list[str]) -> dict:

Problem

-------

Given a list of log strings, extract all

error codes and return a count dictionary.

Error code format: E followed by 4 digits

Examples: E1001, E2034, E9999

Rules:

• A single log line can contain multiple

error codes

• Count total occurrences across all lines

• Return {error_code: count} sorted by

count DESC, then code ASC

Example 1

---------

Input:

[

"2024-01-01 ERROR E1001 disk full",

"2024-01-01 ERROR E2034 timeout",

"2024-01-02 ERROR E1001 disk full",

"2024-01-02 WARN E1001 E3001 retry"

]

Output:

{"E1001": 3, "E3001": 1, "E2034": 1}

Example 2

---------

Input:

["INFO: all systems normal",

"DEBUG: connection ok"]

Output: {} (no error codes)

Example 3 (Edge)

----------------

Input: []

Output: {}

Constraints

-----------

• 0 <= len(logs) <= 50,000

• Each log line <= 500 characters

• Use regex: r'E\d{4}'

Solve this challenge on PySpark.in