The Dask Training Workshop Empowers Data Teams For Parallelism & Analytics at Scale
Dask Training Workshop Overview
This one-day course introduces Dask for scaling data analysis in Python. The workshop comprises initial explorations of the technical limits of NumPy & Pandas, the fundamentals of parallel computing in Python, using Dask dataframes in practice, and machine learning with Dask.
We assume participants have prior experience using the Python language and, in particular, using standard Python tools for data analysis (notably NumPy, Pandas, Scikit-Learn, Jupyter). No prior exposure to Dask or to parallel computing is required.
At the conclusion of this course, participants will be able to:
● Explain relevant parallel computing concepts in the context of data analysis pipelines.
● Identify opportunities for parallel computation in existing Python data workflows.
● Extend example Pandas/NumPy analysis pipelines to become scalable using Dask.
● Construct scalable data analysis pipelines in Python using Dask from scratch.
● Apply Dask dashboard tools to monitor performance of data analytics.
● Adapt existing Scikit-Learn machine learning processes to use relevant Dask-ML idioms.