Episode #12: PySpark

Featuring PySpark Developer: Holden Karau

Episode #12

Air Date 18 January 2019

@12 PM Eastern

We will be joined by Holden Karau, who will tell us about the future of PySpark. PySpark is the Python API for Spark, it exposes the Spark programming model to Python helping data scientists interface with Resilient Distributed Datasets in apache spark. python.Py4J is a popular library integrated within PySpark that lets python interface dynamically with JVM objects (RDD's). Apache Spark comes with an interactive shell for python as it does for Scala.