IT Enables Research: JupyterHub on Kubernetes

17 September, 2020

JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening them with installation and maintenance tasks. Students, researchers, data scientists, and professors teaching classes can get their work done in their own workspaces on shared resources which IT Research Computing manages efficiently by running on its Kubernetes production cluster.

JupyterHub makes it possible to serve a pre-configured data science environment to any user on KAUST campus. It is customizable, scalable, and suitable for small and large teams, academic courses, and large-scale infrastructure.

JUPYTERHUB KEY FEATURES

Customizable - JupyterHub serves a variety of environments. It supports dozens of kernels with the Jupyter server and serves a variety of user interfaces including the Jupyter Notebook, Jupyter Lab, RStudio, and Julia.

Flexible - JupyterHub has KAUST Active Directory authentication enabled so users can use their KAUST Connect credentials to access it.

Scalable - JupyterHub is container-friendly making it possible to deploy it with modern-day container technology. IT Research Computing chose to run it on its production Kubernetes so it can run with hundreds of users.

Portable - JupyterHub is entirely open-source and designed to run on a variety of infrastructure. This includes commercial cloud providers, virtual machines, or even your own laptop hardware.

IT RESEARCH COMPUTING JUPYTER NOTEBOOK OVERVIEW

Oh, man! Read so far and still no link to JupyterHub. OK, I feel you; here is the link for the impatient. This instance is running on IT Research Computing’s production Kubernetes cluster. Each user gets two cores and four GB of RAM. All these Jupyter Notebooks have access to the following storage mediums:

Noor Home which gives you two-hundred GB of backed up storage
DataWaha which gives your group petabytes of backed up storage
Shaheen Lustre filesystem (it will be available soon, we are working with Canonical to fix a bug)
10GB of persistent storage where you can find packages you installed in previous sessions

Having all those storage mediums accessible from a Jupyter Notebook is a cool thing. You can access your files without having to move them around campus; just process them where they lay. All of IT Research Computing’s infrastructure nodes are connected via 10G connections (thanks to the formidable IT Networks Team) to these storage mediums. Massive amounts of bandwidth are available to read/write data thanks to these fast and reliable connections.

IT RESEARCH COMPUTING JUPYTER NOTEBOOK FEATURES

IT Research Computing’s Jupyter Notebook includes libraries for data analysis from the Julia, Python, TensorFlow, and R communities.

The Julia compiler and base environment
IJulia to support Julia code in Jupyter notebooks
HDF5, Gadfly, and RDatasets packages

dask, pandas, numexpr, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh, sqlalchemy, hdf5, vincent, beautifulsoup, protobuf, xlrd, bottleneck, and pytables packages
ipywidgets and ipympl for interactive visualizations and plots in Python notebooks
Facets for visualizing machine learning datasets
The R interpreter and base environment
IRKernel to support R code in Jupyter notebooks

tidyverse packages, including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and broom from conda-forge
devtools, shiny, rmarkdown, forecast, rsqlite, nycflights13, caret, tidymodels, rcurl, and randomforest packages from conda-forge
TeX Live for notebook document conversion
git, emacs-nox, vim-tiny, jed, nano, tzdata, and unzip
TensorFlow and Keras machine learning libraries

ROADMAP

IT Research Computing is continuously trying to find ways to improve its products and services. This means that we already have in mind what comes next for our products and services; or least where we would like to be.

Here is a non-exhaustive list of things we will be working on to improve JupyterHub for KAUST community:

Allow users and/or research groups to spin up a Jupyter instance based on their own image. BinderHub will help in that regards. Stay tuned for when we also launch this service.
Add GPU support for all Jupyter Notebooks. Two things are affecting this today:

Money (duh!) since we must buy more GPU cards to satisfy the demand.
Reconfigure our on-premise cloud to use virtual GPU technology from NVIDIA.

Add more Kubernetes workers dedicated to JupyterHub.
Create interface for KAUST clusters, i.e. run Jupyter workers on HPC nodes.
Add kubelflow, a platform for data scientists who want to build and experiment with ML pipelines. kubeflow is also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving.

HOW DOES THIS BENEFIT MY RESEARCH?

Reading thus far, you might have wondered what the bottom line for you is. How does JupyterHub help your research? Let us see how it helps your research.

All in one place – The Jupyter Notebook is a web-based interactive environment. It combines code, rich text, images, mathematical equations, plots, maps (and much more!) into one document.

Easy to share & convert – You can share Jupyter Notebooks as JSON files, a structured text format. You can also use built-in tools to export notebooks as PDF or HTML.

Language independent – Jupyter was built with this concept in mind. The client runs in your browser connecting to kernels running in any supported language.

Stress-free reproducible experiments – Jupyter Notebooks can help you conduct efficient and reproducible interactive computing experiments with ease. It lets you keep a detailed record of your work. Also, the ease of use of the Jupyter Notebook means that you do not have to worry about reproducibility; just do all your interactive work in notebooks, put them under version control, and commit regularly. Do not forget to refactor your code into independent reusable components.

Effective teaching/learning tool – The Jupyter Notebook is not only a tool for scientific research and data analysis but also a great tool for teaching. You can share a GitLab repo with your students. You can interactively experiment during class. Infinite possibilities!

GitOps@KAUST

IT Research Computing manages almost all of its products and services using GitOps principles. JupyterHub is no different. We have automated GitLab pipelines to push changes to all Kubernetes workers. We also have alerting setup to check both Kubernetes and JupyterHub. Our goal is to detect issues before users do. We try to avoid downtimes; that’s why JupyterHub runs on Kubernetes.

Run your Jupyter Notebooks at https://jupyter.kaust.edu.sa today!

REFERENCES

CONTACT US

Email us @ the IT Research Computing
Slack us @ kaust-rc.slack.com

Visit our web site @ researchcomputing.kaust.edu.sa

KAUST Information Technology Department

it.kaust.edu.sa

We make IT happen!

Share this: