Jupyter Book MyST Markdown Notebooks

Jupyter Book MyST Markdown Notebooks#

Jupyter Book also lets you write text-based notebooks using MyST Markdown.

Create a notebook with MyST Markdown#

MyST Markdown notebooks are defined by two things:

YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed). See the YAML at the top of this page for example.
The presence of {code-cell} directives, which will be executed with your book.

That’s all!

An example of the YAML metadata needed is:

---
jupytext:
  cell_metadata_filter: -all
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
--- 

See the Notebooks with MyST Markdown documentation for more detailed instructions.

An example cell#

With MyST Markdown, you can define code cells with a directive like so:

```{code-cell}
print(2 + 2)
```

When your book is built, the contents of any {code-cell} blocks will be executed with your Jupyter kernel, and their outputs will be displayed in-line with the rest of your content.

Let’s download the penguins.csv dataset from the Deep Learning from HEP course:

mkdir data && cd data
wget https://raw.githubusercontent.com/hsf-training/deep-learning-intro-for-hep/refs/heads/main/deep-learning-intro-for-hep/data/penguins.csv 

Install pandas in the environment you are using to build your book, and then run the following code cell to load the dataset:

```{code-cell} ipython3
import pandas as pd
penguins_df = pd.read_csv("data/penguins.csv")
penguins_df
```

The argument after {code-cell} (ipython3) is used for readability purposes.

Compile the book, and you should see the output of the code cell above.

Visualizing data distributions#

Let’s use Matplotlib to visualize a regression problem with this dataset.

For our regression problem, let's ask, "Given a flipper length (mm), what is the penguin's most likely body mass (g)?"

```{code-cell} ipython3
regression_features, regression_targets = penguins_df.dropna()[["flipper_length_mm", "body_mass_g"]].values.T
```

```{code-cell} ipython3
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

def plot_regression_problem(ax, xlow=170, xhigh=235, ylow=2400, yhigh=6500):
    ax.scatter(regression_features, regression_targets, marker=".")
    ax.set_xlim(xlow, xhigh)
    ax.set_ylim(ylow, yhigh)
    ax.set_xlabel("flipper length (mm)")
    ax.set_ylabel("body mass (g)")

plot_regression_problem(ax)

plt.show()
```

Working with packages#

You can install any packages you need in the environment you are using to build your book. It is a good practice to use requirements.txt files to manage your dependencies.

Let’s add sklearn to our environment:

jupyter-book
matplotlib
numpy
pandas
scikit-learn

and install the packages as

pip install -r requirements.txt

Now we can use sklearn to build a regression model for our problem:

Let's use Scikit-Learn's `LinearRegression`

```{code-cell} ipython3
from sklearn.linear_model import LinearRegression
import numpy as np
```

```{code-cell} ipython3
best_fit = LinearRegression().fit(regression_features[:, np.newaxis], regression_targets)
```

```{code-cell} ipython3
fig, ax = plt.subplots()

def plot_regression_solution(ax, model, xlow=170, xhigh=235):
    model_x = np.linspace(xlow, xhigh, 1000)
    model_y = model(model_x)
    ax.plot(model_x, model_y, color="tab:orange")

plot_regression_solution(ax, lambda x: best_fit.predict(x[:, np.newaxis]))
plot_regression_problem(ax)

plt.show()
```

```{code-cell} ipython3
print("slope:", best_fit.coef_[0])
print("intercept:", best_fit.intercept_)
```

Quickly add YAML metadata for MyST Notebooks#

If you have a markdown file and you’d like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:

jupyter-book myst init path/to/markdownfile.md