Test Data Based on Real Data in PySpark
Quickly build your test datasets based on your real data in similar schemas but with fake data.
Running PySpark from Docker
A very basic Docker setup for running a Jupyter Notebook and a Spark server with Spark UI, which will allow you to play around with new ideas and in general test PySpark locally without an expensive infrastructure.
Emil Moe

Software- and Data Engineer

