Deploying GPUs with QHub

Updated: Mar 18

At Quansight, we have developed Qhub, a framework for deploying data science stacks that facilitates initialization and maintenance. Quansight uses QHub actively in multiple client projects as a tool for running Data Science and Machine Learning workloads. Qhub uses Terraform to make deployment of JupyterHub, JupyterLab, Conda environments and Dask on a Kubernetes cluster declarative. It provides Linux permissioning to facilitate easy collaboration among multiple users and groups. Qhub also has a Jitsi videoconference plugin enabled that makes it an excellent platform for live collaboration and training. The deployment is a one-step process powered by Github actions where you just change a single configuration file, merge/push it to the Github repository, and you are done. Currently, we have good support for Qhub on AWS, GCP, and Digital Ocean.


Problem statement


We needed to get GPUs up and running for a recent client training engagement highlighting PyTorch and OpenCV. We summarize here our approach to enabling GPU support for QHub on GCP.


Solution


We decompose the problem into smaller, more manageable subproblems:

  1. Assigning GPU quotas on GCP

  2. Adding GPUs to the Kubernetes nodes

  3. Installing Nvidia drivers on each node

  4. Adding a profile to JupyterHub

  5. Managing scheduling issues

  6. Verifying the Conda environment

Assigning GPU quotas on GCP