


In case you need a refresher, a quick introduction might be handy.

You do not have to be an expert, but you need to know how to start a Command Prompt and run commands such as those that help you move around your computer’s file system. I am also assuming that you are comfortable working with the Command Prompt on Windows. So the screenshots are specific to Windows 10. In this post, I describe how I got started with PySpark on Windows. Spark supports a Python programming API called PySpark that is actively maintained and was enough to convince me to start learning PySpark for working with big data. While I had heard of Apache Hadoop, to use Hadoop for working with big data, I had to write code in Java which I was not really looking forward to as I love to write code in Python. I decided to teach myself how to work with big data and came across Apache Spark.
