"Introduction to Jupyter Notebooks" by Steve Biggs

Presented at the University of York Physics Coding Club, 13 October 2017

Available online at: http://www-users.york.ac.uk/~snb519/coding-club-jupyter/jupyter-notebooks.html

Introduction

Preamble

This document is a guide of how to install and set up jupyter notebooks as a scientific research record (or "lab book"). This guide assumes the use of a Linux-based OS but most of it is probably applicable to other OS. All the files for this session, including this notebook itself, are available here.

The Jupyter notebook (formerly iPython notebooks) is a web-based application. This means that you run a notebook server (the default is for this to be on your local machine), connect to it using your web browser, and all the functionality is accessible via mouse and keyboard input through your browser.

Jupyter notebooks are particularly useful as scientific lab books when you are doing computational physics and/or lots of data analysis using computational tools. This is because, with Jupyter notebooks, you can:

  • Record the code you write in a notebook as you manipulate your data. This is useful to remember what you've done, repeat it if necessary, etc.

  • Graphs and other figures are rendered directly in the notebook so there's no more printing to paper, cutting and pasting as you would have with paper notebooks or copying and pasting as you would have with other electronic notebooks.

  • You can have dynamic data visualizations, e.g. animations, which is simply not possible with a paper lab book.

  • You can update the notebook (or parts thereof) with new data by re-running cells. You could also copy the cell and re-run the copy only if you want to retain a record of the previous attempt.

Obviously, if you're in the lab noting down numbers from various bits of equipment, then jupyter notebooks might not be the right tool for you. But if you're looking into your data using python or similar, then jupyter notebooks might be useful for you. I certainly find them useful - I use them as my primary research record and I don't even keep a paper lab book anymore.

Features

You can write explanatory text using markdown syntax (markdown guidance is available here).

Markdown includes inline equations using LaTeX syntax, e.g. $\mathbf{E}=-\nabla\phi$, and longer equations like this:

$\partial_t f + \mathbf{v}\cdot\mathbf{\nabla}f + {q \over m}\{\mathbf{E} + \mathbf{v} \times \mathbf{B}\}\cdot\mathbf{\nabla}_{\mathbf{v}}f = \left({\partial f \over \partial t} \right)_{\mathrm{coll}}$

You can also do inline code snippets, e.g. print "Hello world", and longer code snippets like this:

#include<stdio.h>

int main(void) {
    printf("Hello World\n");
    return 0;
}

You can have links and images as I have throughout the rest of this document.

You can also have code cells that will actually run the code, like the following:

In [1]:
print("Hello world!")
Hello world!
In [2]:
import numpy as np
x = np.linspace(-5, 5)
y = x**2
print('x =', x)
print('y =', y)
x = [-5.         -4.79591837 -4.59183673 -4.3877551  -4.18367347 -3.97959184
 -3.7755102  -3.57142857 -3.36734694 -3.16326531 -2.95918367 -2.75510204
 -2.55102041 -2.34693878 -2.14285714 -1.93877551 -1.73469388 -1.53061224
 -1.32653061 -1.12244898 -0.91836735 -0.71428571 -0.51020408 -0.30612245
 -0.10204082  0.10204082  0.30612245  0.51020408  0.71428571  0.91836735
  1.12244898  1.32653061  1.53061224  1.73469388  1.93877551  2.14285714
  2.34693878  2.55102041  2.75510204  2.95918367  3.16326531  3.36734694
  3.57142857  3.7755102   3.97959184  4.18367347  4.3877551   4.59183673
  4.79591837  5.        ]
y = [  2.50000000e+01   2.30008330e+01   2.10849646e+01   1.92523948e+01
   1.75031237e+01   1.58371512e+01   1.42544773e+01   1.27551020e+01
   1.13390254e+01   1.00062474e+01   8.75676801e+00   7.59058726e+00
   6.50770512e+00   5.50812162e+00   4.59183673e+00   3.75885048e+00
   3.00916285e+00   2.34277384e+00   1.75968347e+00   1.25989171e+00
   8.43398584e-01   5.10204082e-01   2.60308205e-01   9.37109538e-02
   1.04123282e-02   1.04123282e-02   9.37109538e-02   2.60308205e-01
   5.10204082e-01   8.43398584e-01   1.25989171e+00   1.75968347e+00
   2.34277384e+00   3.00916285e+00   3.75885048e+00   4.59183673e+00
   5.50812162e+00   6.50770512e+00   7.59058726e+00   8.75676801e+00
   1.00062474e+01   1.13390254e+01   1.27551020e+01   1.42544773e+01
   1.58371512e+01   1.75031237e+01   1.92523948e+01   2.10849646e+01
   2.30008330e+01   2.50000000e+01]
In [3]:
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()

A Real Example

Here's a screen-shot of some work I did in a notebook just the other day.

alt text

As you can see in the above screenshot, I have a mixture of code, figures, explanatory text, code snippets, and it's all dated (as all research records should be).

So, hopefully you are now convinced that they might be useful. The remainder of this document explains how to install jupyter notebooks, how to add extensions that enhance the functionality thus making jupyter notebooks more useful as research records, and some common problems that you may encounter (with solutions where possible).

Jupyter Notebooks

Installation

To install jupyter, run the following command:

pip3 install --user jupyter

or

pip install --user jupyter

for a python3 or python2 version respectively. The remainder of this document assumes python3.

The --user flag causes the installation to be into the user's home directory. This is necessary on systems where sudo access is not available. On systems where sudo access is available, using the --user flag is still preferable to prevent confusion of the system package manager.

The above installs jupyter into ~/.local/bin, so you will need to add that location to your system search path by putting the following into your ~/.bashrc file:

export PATH=\$PATH:~/.local/bin

Alternatively, the Anaconda Distribution comes with Jupyter Notebooks included.

For further details, please see the Jupyter installation page.

Startup

To start a jupyter notebook server, navigate to a suitable working directory and type the following command into the terminal:

jupyter notebook

This starts a notebook server and automatically opens it in the browser. You should get something like this:

alt text

Basic Functionality

Here's a quick list of the main stuff that I do on a day-to-day basis:

  • Click New in the top-right corner to start a new notebook.
  • Type in some python code and press Shift+Enter to execute.
  • Change the cell type from Code to Markdown using the drop-down box (top-middle) to write explanatory text. You can also put in maths using LaTeX syntax, code snippets, images, links, etc. Markdown guidance is available here.
  • Add new cells using the '+' button (top-left).
  • Save the notebook using the disk button (top-left) - notebook are saved automatically but I have found that it is not 100% reliable so I still click save periodically and before stopping work - you might want to test it out yourself on some non-critical work.
  • Run cells in various ways (run all, run selection, run all above, run all below, etc.) using the options in the 'Cell' menu.
  • Interrupt the kernel, restart it, clear all output, etc. using the options in the 'Kernel' menu.
  • Download as python script, HTML, LaTeX, etc. using the options under 'File > Download as'.
  • Split and merge cells using the options in the 'Edit' menu.

Notebook Extensions

Extensions are great! They can be used to turn an already useful tool into a viable scientific laboratory notebook. Some set-up is required as follows. NB: All jupyter notebook instances must be shutdown before making the changes below. If you encounter problems, see "Killing Lost Servers" below.

The Main (Unofficial) Repository

There is no official extensions repository but you can install the main unofficial repository as follows:

  1. Install the python package:

     pip3 install --user jupyter_contrib_nbextensions
  2. Copy files to jupyter path (use the following in both cases):

      jupyter contrib nbextension install --user
  3. Enable configurator (use the following in both cases):

      jupyter nbextensions_configurator enable --user
  4. Restart the server and the landing page should now have a 'Nbextensions' tab where you can browse and enable extensions.

For further details please refer to the jupyter_contrib_nbextensions documentation and the configurator GitHub site (the above is based on those pages).

Another (Unofficial) Repository

There is another repository of extensions which can be added following steps equivalent to 1 and 2 above:

  1. Download using wget or git as you prefer:

     wget https://github.com/Calysto/notebook-extensions/archive/master.zip
     unzip master.zip
     cd notebook-extensions-master
    

    or

     git clone https://github.com/Calysto/notebook-extensions.git
     cd notebook-extensions
  2. Copy files to jupyter path as above:

      jupyter nbextension install calysto --user

The above can be done while the server is still running but you will have to restart the server to see the additional extensions. On doing so, they should be available within the 'Nbextensions' tab.

For further details, please refer to the Calysto/notebook-extensions documentation (the above is based on that page).

A List of Useful Extensions for Scientific Lab Books

Here's the list of extensions that I use. I find most of these extensions are pretty useful if you're using jupyter notebooks as a lab book.

  • AddBefore (This extension enables the Add Cell before button)
  • contrib_nbextensions_help_item (enabled by default)
  • ExecuteTime (Display when each cell has been executed and how long it took)
  • Freeze (Freeze cells (forbid editing and executing) or make them read-only)
  • Initialization cells (Mark certain cells as 'initialization' cells to be run with one click - useful for imports, data loading, etc. - NB: to see the tick boxes, go to 'View > Cell Toolbar > Initialization Cell')
  • Nbextensions dashboard tab (enabled by default)
  • Nbextensions edit menu item (enabled by default)
  • Spell-Check Markdown (adds a spell-check button for turning on/off spelling checker)
  • Table of Contents (2) (add a table of contents as a sidebar and/or cell based on markdown headers)

Some extensions might be greyed-out depending on which verison of Jupyter you are running. This is because some extensions are not explicitly compatible with certain versions of Jupyter. To get around this, you can either install a different version of Jupyter or there is an option within the configurator to allow enabling of "incompatible" extensions that might work anyway.

One Final Extra Extension

The above extensions are useful but there's on crucial thing missing for a scientific laboratory notebook... automatic date stamping! There is a datestamper extension but it only inserts the date when clicked (and it's text within a cell rather than metadata), so it's easy to forget (I know - I tried it). So an automatic datestamper similar to the ExecuteTime extension above would be useful.

Unfortunately, I could not find any such extension... so I wrote my own! So, three points:

  1. To install my DateAdded extension, please enter, from a suitable place on your system, the following commands (equivalent to steps 1 and 2 from above):

     git clone https://gitlab.com/steve_biggs/date_added.git
     jupyter nbextension install date_added --user
  2. Extensions are written in javascript. I didn't know any javascript when I started writing the extension... and I still don't! I just worked out how to get the functionality I wanted by hacking about with ExecuteTime. The point is, you can write your own extensions too! For a very basic tutorial, see the git history of my DateAdded extension (available by cloning the repository, as above). This starts of with a javascript "hello world" extension and progresses from there.

  3. I also modified ExecuteTime so that it prints the execute time for markdown cells too (by default it only does so for code cells) and thus effectively provides "last edited" information for markdown cells (well, strictly speaking, it's "last rendered" but that's close enough!). My modified version of ExecuteTime is available for you to download here. Simply find ExecuteTime.js within the location where notebook extensions get installed (usually ~/.local/share/jupyter/nbextensions/ or similar) and replace the original version with my modified version.

Troubleshooting / Gotchas / Advanced Usage

There's No Undo! (Or At Least It Doesn't Work Properly)

This is probably the biggest problem with jupyter notebooks. The usual Ctrl+Z doesn't work if you delete some content from a cell. It's so easy to delete something and then realize it is needed so this has left me annoyed a few times. There is an undo if you delete the whole cell but not for text within it. It's surprisingly not been that much of a problem for me - less so that I would have imagined anyway - but yeah, be careful! And use version control (see below).

Notebook Files Are Plain Text So Version Control Them

Notebooks are saved as *.ipynb files. These are plain text files that contain the code and markdown that you type along with a load of meta-data about the notebook and its cells. This means that these files can easily be tracked with a version control system like git. I strongly recommend that you do this, especially if you are developing computational analysis procedures within your notebooks.

Don't Let Your Notebooks Get Too Big

Seriously, don't let your notebooks get too big. There are two reasons for this:

  1. Basic readability. When they get too big, they become too hard to navigate (even with a table of contents) and you forget where information is. Better to break your work down into smaller chunks with a new notebook (with a helpful title).

  2. Performance. When the notebook gets too big, there is too much processing to do and it starts to take ages to load and then slows down during use. I've also had one notebook that got way too big to the point where it now causes some sort of crash during extension initialization so I don't have table of contents or date added information anymore (the latter is still in the metadata, just not displayed).

It's up to you to define how big a notebook can get before it's "too big" - just use your common sense (and be warned!).

Notebooks Can Run Other Programming Languages

If, for any reason, you need to use a programming language other than python, this can be done within jupyter notebooks. All you need is an appropriate iPython kernel. Kernels have been written for most of the popular languages. Our own Peter Hill even wrote his own Fortran kernel so you can run Fortran via an interactive interpreter. I haven't tried this myself so, if you're interested, talk to Pete.

Jupyter Notebooks Depend on Cookies

If, like me, you use a cookie blocker, then Jupyter Notebooks simply won't work. I found that even if I tried to white-list localhost it wouldn't work. My solution was to install another browser (midori), then use my main browser (with cookie blocker) for general browsing and midori for jupyter notebooks exclusively. This has the added bonus that the notebook is in a separate window with a different icon, which helps to differentiate the notebook from general browsing. To get this to work reliably, I had to write a little script because my system kept trying to default to the wrong browser. In case you face a similar situation, the script is available here. (NB: This will only work when using tokens to log on, i.e. won't work when using a password to log on. The default behavior is to use tokens so if you don't know what I'm on about, then it will probably work.)

Killing Lost Servers

Sometimes, a server can still be running but the process is neither in the foreground nor the background - the server has somehow got lost! To kill the lost server, one has to first find its process ID, then use the kill command on that process ID. This is done as follows:

  • List all running notebook servers:

      jupyter notebook list
  • Take note of the port number, which is the bit after 'localhost:'

  • Run netstat to get a list of running processes and pipe the output to grep for the port number

      netstat -tulpn 2>&1 | grep <port_number>
  • Take note of the process ID, which is the number before the '/python3' at the end of the line

  • Kill the process:

      kill <process_id>

I have written that up into a script which uses regular expressions to match the port number and process ID. This kills all running notebook servers that it can find - to only kill a certain one, please use the manual method above. The script is available for you to download here. (NB: This will only work when using tokens to log on, i.e. won't work when using a password to log on. The default behavior is to use tokens so if you don't know what I'm on about, then it will probably work.)

Single-user Notebooks on Remote Machines

Firstly, please note: you must be careful that you don't open a security vulnerability when doing stuff like this! Please read the guidance here before continuing.

You can run a notebook server on a remote machine, e.g. if your data set is too big to download or if you want to leave your server running and access it from multiple devices. NB: This is not a multi-user solution (see below).

A naive way to do this would be to just use ssh -X <remote-machine> and then start the server as usual. However, when I tried this on kink.its.york.ac.uk (i.e. a server on campus so should be a good connection), I found that the performance was poor. It turns out that the graphics coming over the network are causing the slow down. Therefore, a better method is as follows:

  • ssh to the server without the -X flag, i.e.

      ssh <remote_user>@<remote_host>
    

    e.g.

      ssh snb519@kink.its.york.ac.uk
  • Start the server without the automatic browser pop-up and assigned to a port of your choosing:

      jupyter notebook --no-browser --port=<remote-port>
    

    e.g.

      jupyter notebook --no-browser --port=8899
  • The above process could even be detached from your current session so you can exit from ssh but leave the notebook server running.

  • On your local machine, open an ssh tunnel to re-direct the remote port to a local port of your choosing:

      ssh -N -L localhost:<local_port>:localhost:<remote_port> <remote_user>@<remote_host>
    

    e.g.

      ssh -N -L localhost:8890:localhost:8899 snb519@kink.its.york.ac.uk
    

    or, if you're off campus and you're trying to access a University of York server:

      ssh -J <remote_user>@ssh.york.ac.uk -N -L localhost:<local_port>:localhost:<remote_port> <remote_user>@<remote_host>
    

    e.g.

      ssh -J snb519@ssh.york.ac.uk -N -L localhost:8890:localhost:8899 snb519@kink.its.york.ac.uk
    

    (NB: You will need access to ssh.york.ac.uk for this to work, which can be requested via IDM or IT support.)

  • Point your browser to the URL given when you started the remote notebook server but using the local port you have defined. In this example, the address would be http://localhost:8890/?token=<some_long_hash> whereas the server would have given you http://localhost:8899/?token=<some_long_hash>.

For further details, see this page, which is where I got the above information. If you try it and you find (like me) that copying tokens back and forth is annoying, then try setting up password access following this guide where the key steps are:

  • Generate a default config file (~/.jupyter/jupyter_notebook_config.py):

      jupyter notebook --generate-config
  • Generate a hashed password (saved to ~/.jupyter/jupyter_notebook_config.json) by entering the following command and then typing your desired password when prompted:

      jupyter notebook password
  • Copy the hashed password from ~/.jupyter/jupyter_notebook_config.json into the appropriate line in ~/.jupyter/jupyter_notebook_config.py (and uncomment) so you end up with something like this:

      c.NotebookApp.password = u'sha1:67c9e60bb8b6:9ffede0825894254b2e042ea597d771089e11aed'

Multi-user Notebooks

There is a thing called JupyterHub which is the proper way to host a multi-user notebook server which might be useful for collaboration and could potentially be used for teaching. However, I have not investigated this in detail as there is no need for it yet. If lots of people start using jupyter notebooks, then we could look into whether JupyterHub would be of benefit. Work is also ongoing to facilitate real-time live collaboration by multiple users on the same notebook - more inforamtion is available here and here.

Summary

Jupyter notebooks are useful as a scientific research record, especially when you are digging about in your data using computational tools. This document has explained the features and benefits, how to install jupyter notebooks, and how to install and enable extensions that provide useful functionality. A number of common issues have also been discussed.

I hope you found this useful. If, after trying jupyter notebooks, you find they are useful and you continue using them, please let me know (snb519 (at) york (dot) ac (dot) uk) as it would be nice to have a list of local users so we can share tips and tricks. Also, if we get enough local users, it might be worth investigating JupyterHub too.

Enjoy!