Updated: Mar 18
At Quansight, we have developed Qhub, a framework for deploying data science stacks that facilitates initialization and maintenance. Quansight uses QHub actively in multiple client projects as a tool for running Data Science and Machine Learning workloads. Qhub uses Terraform to make deployment of JupyterHub, JupyterLab, Conda environments and Dask on a Kubernetes cluster declarative. It provides Linux permissioning to facilitate easy collaboration among multiple users and groups. Qhub also has a Jitsi videoconference plugin enabled that makes it an excellent platform for live collaboration and training. The deployment is a one-step process powered by Github actions where you just change a single configuration file, merge/push it to the Github repository, and you are done. Currently, we have good support for Qhub on AWS, GCP, and Digital Ocean.
We needed to get GPUs up and running for a recent client training engagement highlighting PyTorch and OpenCV. We summarize here our approach to enabling GPU support for QHub on GCP.
We decompose the problem into smaller, more manageable subproblems:
Assigning GPU quotas on GCP
Adding GPUs to the Kubernetes nodes
Installing Nvidia drivers on each node
Adding a profile to JupyterHub
Managing scheduling issues
Verifying the Conda environment