ETL PipelineNotebook
1UploadClick here
2Bronze
3Silver
4Gold
Connecting…
Step 1

Upload Your Raw Data

Choose a data file from your computer to begin the ETL pipeline. It will be stored securely in the Bronze layer for processing.

CSVParquetJSONORCXMLTXT
How it works
1
UploadSelect your raw CSV, Parquet, or JSON file
2
BronzeIngest and inspect raw data as-is
3
SilverClean, filter and transform the data
4
GoldAggregate and export business-ready data
Run PipelineClick "Run Pipeline" in the header to execute all layers at once
Session
LayerBronze
Files0
DataFrames0
Bronze
Raw Data Ingestion
Load your files exactly as-is. No transformation here — just bring the data in and explore it.
1Upload a file using the button above
2Find it in My Files below → click Load
3Run the cell → creates df_bronze for Silver to use
Input
CSV · Parquet · JSON · ORC
Output
df_bronze
raw DataFrame → Silver
My FilesUpload a file above to get started
Only you can see these files

No files uploaded yet.

Upload a CSV, Parquet, or JSON above. Only your files appear here.

1
Add Cell:

Online PySpark compiler FAQ

Helpful answers for people searching for a PySpark playground, ETL notebook, or browser-based Spark practice.

Can I run PySpark online without installing Spark?

Yes. This notebook lets you run PySpark in the browser, so you can test code, explore DataFrames, and practice transformations without a local Spark setup.

Is this useful for ETL pipeline practice?

Yes. The workspace is organized around bronze, silver, and gold layers so you can practice raw ingestion, cleaning, transformation, aggregation, and export in one flow.

What languages can I use in the notebook?

You can work with PySpark, Python, SQL, Pandas, Matplotlib, and NLP cells, which makes the page useful for end-to-end data engineering experiments.

Who should use this online compiler?

It is a strong fit for beginners learning Spark, data engineers prototyping ETL logic, and interview candidates who want a fast PySpark playground.

Session Variables
No DataFrames yet — run a cell first.