skip to Main Content

I’m using the Jupyter extension (v2022.9.1303220346) in Visual Studio Code (v1.73.1).

To reproduce this issue, make any modification to the notebook and check it into git. You’ll observe that you get an extra difference for execution_count. For example (display from Git Gui):

-   "execution_count": 7,
+   "execution_count": 9,

The execution count doesn’t appear to be useful and is noise in the git history. Can Jupyter or VS Code be configured to stop updating this value or (better) ignore it altogether?

2

Answers


  1. Can Jupyter or VS Code be configured to stop updating this value or (better) ignore it altogether?

    I’m not sure about VS Code, and I think the answer for VS Code config options might be no after reading some discussions in GitHub feature-request issue tickets for Jupyter notebooks, where the fact that they are feature-requests indicates to me that the answer also currently seems to be no, but also that there are plenty of approaches to tackling the problem:

    • In jupyter/notebook: Suggestion: Separate file for notebook executed cell outputs. #5677

      I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git. – jbursey

      Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:

      Use a commit hook as outlined in Jupyter docs.

      Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.

      – gitjeff05

      Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.

      Their paired notebooks avoid the need for automatically saving and converting the notebooks.

      – IvoMerchiers

    • In jupyterlab/jupyterlab: Using a notebook & git creates too many diff #9444

      It would be much simpler if we had an option to save only the input cells, not the output ones. And to reset the cell index (execution_count) to 0 without restarting the kernel. – sylvain-bougnoux

      I think that you can configure the underlying nbdiff to ignore outputs, see: https://nbdime.readthedocs.io/en/latest/config.html#configuring-ignores – krassowski

    • In jupyterlab/jupyterlab-git: Cleaning Notebook cell outputs #392

      Notebooks cell outputs can be a hindrance in Version Control while reviewing the diff of a commit to see what changed (either in a PR or historically)

      Some ideas on how we could enable users to deal with outputs in cell in jupyterlab-git

      1. Enable a Command Palette option to easily install a Git filter with nbstripout
      2. Prompt the user to remove outputs from cells if we detect that there are cell outputs during a git push
      3. Use the JupyterLab settings registry to let the user specify that all Notebook outputs must be cleaned on a git push

      – jaipreet-s

      With #700, it is now possible to add nbstripout (for example) when initializing a git repository. – fcollonval

    For your learning purposes / reference, I found this info by googling "github issues jupyter notebook put execution_count in separate file" and looking through the top search results and linked GitHub issues in their discussion threads.

    Login or Signup to reply.
  2. The .ipynb format contains your input code cells, output data and a variety of metadata to reproduce the exact form you see when running the notebook interactively.

    The "execution_count" is unfortunately only one of them, there are many more (cell collapsed, extension metadata and more) that are stored and do not represent any difference in the code of the notebook. So therefore it is not really possible to preserve all the information and generate meaningful differences in git. While there are discussions which data to keep or throw out for version control purposes the underlying JSON format is not ideal anyway for this purpose, as for example each line in each cell gets encoded like this:

       "source": [
        "for fizzbuzz in range(101):n",
        "    n",
        "    if fizzbuzz % 3 == 0 and fizzbuzz % 5 == 0:n",
        "        print("fizzbuzz")n",
        "        continuen",
        "        n",
        "    elif fizzbuzz % 3 == 0:n",
        "        print("fizz")n",
        "        continuen",
        "        n",
        "    elif fizzbuzz % 5 == 0:n",
        "        print("buzz")n",
        "        continuen",
        "        n",
        "    print(fizzbuzz)"
       ]
      },
    

    which is rather hard to read compared to the underlying code.

    One possibility out of this is to use the Jupytext extension. This pairs your .ibynb file with a regular .py file while keeping some of the metadata intact. The paired .py file can be viewed & edited with any editor, works well with git, and does not depend on the complete jupyter infrastructure.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search