CUAHSI JupyterHub

Overview

The CUAHSI JupyterHub is a web application that allows HydroShare users to execute scientific code in the cloud. It’s hosted on the Google Cloud Platform and is maintained by the CUAHSI Compute staff. While this application supports the execution of a variety of codes, it’s primarily designed to write, build, and run Jupyter Notebooks.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. - Project Jupyter

A Jupyter notebook is thus an enhanced computational environment that combines rich text and code execution into a single script-like container. The CUAHSI JupyterHub combines this functionality with the HydroShare data repository to provide a rich computational environment for water scientists.

This computational environment is hosted on the Google Cloud Platform and has been carefully designed to provide maximum interoperability with the HydroShare data repository. It offers free, limited computational resources primarily aimed at education and reproducible science. If you’re interested in contributing to this web application, contact Tony Castronova <acastonova@cuahsi.org>.

Getting Started

There are multiple ways to access the CUAHSI JupyterHub web application from HydroShare. The simplest is to launch it from the HydroShare Apps library (see below). This will launch an isolated, customized, cloud computing environment. In this space, you can create files and execute code from within your web browser. Any data you upload, download, and create is associated with your HydroShare account and will persist between sessions, meaning that it will be there next time you log in. Prior to gaining access, you will be asked join the CUAHSI JupyterHub HydroShare group (see the Access and Authentication section for details).

Another way to access the CUAHSI JupyterHub web application is using the HydroShare “Open with …” functionality (shown below). This button can be found in the top right corner of any HydroShare resource landing page. After selecting “CUAHSI JupyterHub”, a computing environment will be prepared and the content of the current HydroShare resource will be placed inside of it. This is a convenient method for executing code, data, and workflows that have been published in the HydroShare repository. 

Access and Authentication

To access the CUAHSI JupyterHub platform, you must be a member of the CUAHSI JupyterHub Group. Group membership limits system interruptions and ensures that resources are effectively curated and managed. When first accessing the application, you will be directed to the CUAHSI JupyterHub Group landing page. Request to join the group, and after admission has been granted you will be able to access the computational environment. To expedite the approval process, please ensure that your HydroShare user profile is complete and up-to-date. Contact help@cuahsi.org if you have any questions regarding this process.

Advanced: Creating Persistent Virtual Environments

The CUAHSI JupyterHub supports persistent virtual environments using the Anaconda package manager. This new feature allows you to create custom environments that capture software dependencies of unique use cases that will persist between sessions. The following steps demonstrate how to create a persistent  virtual environment .

Launch a terminal within your JupyterHub instance. Create a new Anaconda virtual environment using the command below, making sure to install the “ipykernel” library.

Once successfully created, you should see instructions for activating and deactivating the environment: 

This new environment is saved in `data/conda-envs` and will persist between JupyterHub sessions. This means that next time you come back, your Anaconda environment will exist. To use this environment you may need to click the refresh button in the JupyterLab toolbar. You should be able to create new notebooks with this custom environment from the Launcher panel, e.g. clicking on the icon showing “conda env:test-env”.

Alternatively, you can activate this virtual environment in existing notebooks via the kernel selection dialog.

Computing Hardware and Resources

Each HydroShare user is allocated 2 CPUs and 4 GB of RAM on the CUAHSI JupyterHub. Users are granted up to 5 GB of disk space that will persist until 3 months of inactivity, after which all personal data and files are permanently removed from the JupyterHub platform. Prior to data removal, users are notified via email so data can be downloaded from the server if necessary.

Legacy JupyterHub Platform 

The Legacy CUAHSI JupyterHub web application has been in service since 2014 and will be decommissioned on June 30th, 2020. It has been replaced by a scalable, fault tolerant, and more customizable platform hosted on the Google Cloud Platform.

To ensure no data loss, users must download all files prior to June 30th, 2020.  Below are several methods for downloading data from the legacy JupyterHub system (https://jupyter.cuahsi.org)

  1. Download via the tree view interface by selecting the file of interest and clicking the “Download” button.
  2. For downloading many files (or directories), first compress them, then download the compressed archive using method 1 (above).
  3. Syncing data to your HydroShare iRODs space. First, make sure that your iRODs account has been activated (see documentation). Next, open a terminal inside the CUAHSI JupyterHub and initialize your iRODs connection as shown below.
    Use the “ils” command to list files that exist in your HydroShare iRODs account. 

    Use “irsync” to move data from the CUAHSI JupyterHub into your HydroShare iRODs account. 

Credits, Authors, Contributors and Contacts

Terms of Use

For questions, please contact help@cuahsi.org 

Tony Castronova, Dan Palmer, Neal DeBuhr