Pre-commit and R Packaging

quarto
R
git
pre-commit
github actions
Author

Neil Shephard

Published

July 29, 2023

This post is aimed at getting you up and running with the R precommit Package. This shouldn’t be confused with the Python pre-commit Package although as you might suspect they are closely related.

The R package (precommit) consists of a number of R specific hooks that are run by pre-commit before commits are made and check various aspects of your code for compliance with certain style and coding standards (mostly aspects of R packages and I’ll be posting more on R packaging in due course).

A major part of this post is about getting things setup on Windows, I’ve only given a light overview of some of the hooks and common problems encountered as I’ve gone about using and learning R packaging because not everyone uses Windows.

Most of the work on the R package is by Lorenz Walthert, if you find it useful consider sponsoring his work, these things take a lot of time and effort and whilst they can be used for free are worth supporting.

pre-commit

I love using pre-commit in my development pipelines and have blogged about it a few times already. It saves so much hassle (once you are used to it) not just for yourself but also your collaborators who are reviewing your Pull Requests. The R precommit package comes with a set of hooks that can be enabled and configured individually. I’ve recently and reason to start making and R package and as I’ve not used R much for a few years and this was my first time attempting to develop a package, I decided to use the hooks to impose the various style standards and checks that are expected.

I opted to enable all of the hooks. I’ve not covered them all here in detail (yet) but describe some of them below and show how to use some additional hooks from the usethis package too.

codemetar

There is a hook for checking the Codemeta, which is in JSON-LD format is created correctly. The R package codemetar facilitates creating this and pulls metadata from the DESCRIPTION, README.Rmd and other aspects of your package to format them in JSON Codemeta. It comes with a handy function to write the file for you, so after installing you can just run codemetar::write_codemeta() which will create the codemeta.json for you. Remember to run this each and every time you update and of the files from which the metadata is created (although keep an eye on #491 which suggests updating automatically)

roxygenize

Roxygen2 is a package for making the documentation to go with your package, it does this by parsing the documentation strings (“docstrings” for short) that you adorn your functions with that describe the arguments and show example usages. This hook requires additional configuration in .pre-commit-config.yaml as you have to install your package dependencies. Fortunately there is a helper function in the precommit package so you can just run precommit::snippet_generate("additional-deps-roxygenize") and it will output the YAML that you need to add to your .pre-commit-config.yaml. It might look something like the following.

    hooks:
    - id: no-debug-statement
    - id: roxygenize
      additional_dependencies:
        -    data.table
        -    dplyr
        -    dtplyr
        -    duckdb
        -    IMD
        -    lubridate
        -    stringr

style-files

The style-files hook runs the styler package against your code to ensure it follows the tidyverse style guide by default, although it can be configured to use a custom style guide of your own creation.

lintr

The lintr package lints your code automatically. It can be configured by adding a .lintr configuration file to your repository, a simple example is shown below. Note the indented closing parenthesis is important you get a complaint about that and any other formatting issues.

linters: linters_with_defaults(
         line_length_linter(120),
         object_name_linter = NULL,
         object_usage_linter = NULL
  )

spell-check

This is a useful hook that checks your spelling and adds unusual words to a custom dictionary inst/WORDLIST.

deps-in-desc

This hook ensures that all dependencies that are loaded by your package are listed in the DESCRIPTION file so that when the package is installed the necessary dependencies are also pulled in, fairly essential..

usethis package

The usethis package is a compliment to the devtools package that has a lot of very useful helper functions. Some of these enable additional pre-commit hooks whilst others enable GitHub actions, which are part of Continuous Integration pipelines and I would highly recommend enabling them.

README.Rmd

The user_readme_rmd() function automatically generates a README.Rmd template and will also create a pre-commit hook that keeps it synchronised with README.md whenever you update it. This is useful because the later, plain-markdown, file is automatically rendered by GitHub/GitLab/Codeberg as your repositories front-page.

use_github_action()

Invoking use_github_action() within your package repository will prompt you for the type of action you wish to add to it. There are, as of writing, three options.

    > use_github_action()
    Which action do you want to add? (0 to exit)
    (See <https://github.com/r-lib/actions/tree/v2/examples> for other options)

    1: check-standard: Run `R CMD check` on Linux, macOS, and Windows
    2: test-coverage: Compute test coverage and report to https://about.codecov.io
    3: pr-commands: Add /document and /style commands for pull requests

Selecting one will write a file to /.github/workflows/<FILENAME>.yaml and then print out code to add a badge to your repository.

Selection: 1
    ✔ Adding '*.html' to '.github/.gitignore'
    ✔ Creating '.github/workflows/'
    ✔ Saving 'r-lib/actions/examples/check-standard.yaml@v2' to '.github/workflows/R-CMD-check.yaml'
    • Learn more at <https://github.com/r-lib/actions/blob/v2/examples/README.md>.
    • Copy and paste the following lines into 'README.Rmd':
      <!-- badges: start -->
      [![R-CMD-check](https://github.com/CUREd-Plus/cuRed/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/CUREd-Plus/cuRed/actions/workflows/R-CMD-check.yaml)
      <!-- badges: end -->
      [Copied to clipboard]

Badges

Most of the GitHub Action functions described above include output that can be copy and pasted into README.Rmd to include badges in your GitHub front page. Again the usethis has you covered and can generate the necessary code for the different badges it supports.

Gotchas

When starting out I found that I regularly didn’t pass the pre-commit hooks first time. This can be jarring and confusing to start with but its not something to worry about, they are there to ensure your code and package meet the standards required. If you ever come to submit to CRAN you will be grateful to have adhered to these standards.

Below I detail common “gotchas” I encountered when developing the package, what they mean and how to resolve them.

The following spelling errors were found:

The spell-check hook will fail if you’ve introduced new words that aren’t in standard dictionaries with messages similar to the those shown below. Sometimes these will be new words, sometimes they might be catching typos you have made. In the example below famiy should be family so you need to correct the source of the typo (and you’re told where this is, in this case it was line 27 of CITATION.cff), or if the new word should be added to the dictionary you will have to stage the updated inst/WORDLIST file for inclusion in your commit.

spell-check..............................................................Failed
- hook id: spell-check
- exit code: 1
- files were modified by this hook

 Using R 4.3.1 (lockfile was generated with R 4.2.1)
 Using R 4.3.1 (lockfile was generated with R 4.2.1)
The following spelling errors were found:
  WORD    FOUND IN
famiy   CITATION.cff:27
All spelling errors found were copied to inst/WORDLIST assuming they were not spelling errors and will be ignored in the future. Please  review the above list and for each word that is an actual typo:
 - fix it in the source code.
 - remove it again manually from inst/WORDLIST to make sure it's not
   ignored in the future.
 Then, try committing again.
Error: Spell check failed
Execution halted

! codemeta.json is out of date

If you modify the DESCRIPTION or CITATION.cff then the codemeta-description-updated hook will fail with error messages similar to the following.

codemeta-description-updated.............................................Failed
- hook id: codemeta-description-updated
- exit code: 1

 Using R 4.3.1 (lockfile was generated with R 4.2.1)
 Using R 4.3.1 (lockfile was generated with R 4.2.1)
Error:
! codemeta.json is out of date; please re-run codemetar::write_codemeta().
Backtrace:
    
 1. └─rlang::abort("codemeta.json is out of date; please re-run codemetar::write_codemeta().")
Execution halted

This means yo need to update the codemeta.json with

codemetar::write_codemeta()

Warning: Undocumented code objects:

If this error arises its because there is a .Rd file missing. You can generate these by ensuring you have the appropriate docstring definition prior to your function and then use the roxygen2::reoxygenise() function to generate the documentation automatically. Don’t forget to git stage and git commit the files to your repository, pushing if needed (e.g. a Continuous Integration pipeline is failing).

Windows

I haven’t used Windows for about 23 years but I often have colleagues who do and that was the case with the R package that I have started developing so I needed to get all members of the team up and running with the precommit R package/pipeline.

Windows doesn’t come with Python by default, but pre-commit is written in Python and so an environment is required in order to run the above pre-commit hooks. There are many options for this, including using Windows Subsystem for Linux (WSL). I opted to try the solution provided in the precommit vignette. This shows how to use the reticulate package which acts as a glue between R and Python, to handle installing a Miniconda environment and setting up precommit/pre-commit.

The following runs you through the things you need to install (R, RStudio, GitBash), setting up GitHub with SSH keys and enabling precommit for your R package locally.

Install R

When installing the defaults are fine, request admin permissions if required.

Install Rstudio

Defaults are fine, request admin permissions if required.

Install GitBash

During installation you’ll be asked a number of questions, if you’re unsure how to respond to any of them the following provides guidance.

  1. Text Editor - Configure with your choice of editor, obviously you’ll want to have Emacs available and select that! 😉
  2. Adjust your PATH environment - At the bare minimum go with the Recommended option and allow Git from the command line and also from 3rd-party software. Optionally I would recommend the third option of Use Git and optional UNIX tools from the Command Prompt, particularly if you are either a) familiar with UNIX commands or b) not at all familiar with them (as you won’t have to re-learn the Windows commands, just learn the Bash commands they are more widely applicable).
  3. Use Bundled SSH library - Use the bundled SSH library.
  4. Use Bundled OpenSSL library - Use the bundled OpenSSL library.
  5. Checkout Windows-style, commit Unix-style line endings - This is fine, it just changes the internal representation of the carriage return to be more universal.
  6. Use MinTTY - The default terminal of MSYS2 is fine and more functional than the Windows’ default console window.
  7. Default Merge behaviour - The default (fast-forward or merge) this is fine.
  8. Choose a credential helper - Select None here, we will let RStudio manage these.
  9. Configure Extra Options - Defaults are fine.
  10. Configuring experimental options - No need to enable any of these.

Configure Git

Start a GitBash shell and configure your email address and name.

git config --global user.email "you@example.com"
git config --global user.name "Your Name"

Configure RStudio/GitHub with SSH keys

  1. Start RStudio
  2. Create SSH key - Navigate to Tools > General Options > Git/SVN > Create SSH Key and under SSH key type select the default (ED25519) this is a very secure elliptic curve algorithm and is supported by GitHub. Use a secure password (i.e. long), do not change the location it is created at.
  3. Once created select View public key and use Ctrl + c to copy this to your clipboard.
  4. Navigate to GitHub and login then click on your avatar in the top right and select Settings > SSH and GPG keys > New SSH Key.
  5. Give the key a name and paste into the box below where indicated/instructed then click on Add SSH key.

Clone Repository

Its likely that you will have an existing repository that you wish to work on with this pipeline, if so you will have to clone it locally so you can work on it with the precommit pipeline. The following assumes you have added your SSH key to your GitHub account as described above.

  1. Navigate to the repository you wish to clone (e.g. https://github.com/CUREd-Plus/cuRed/) and click on the Code button then select SSH under the Local tab in the box that appears.
  2. Click on the box that has two small squares to the right of some text to copy the URL to clipboard.
  3. Return to RStudio and start a new project with File > New Project > Version Control > Git and paste the URL into the Repository URL. Select a location to clone to under Create project as subdirectory of:, e.g. c:/Users/<username>/work/cuRed (replacing <username> with your username).
  4. If prompted for password enter it. If asked to answer Yes\/No answer Yes and then if prompted to Store password for this session answer Yes.
  5. You should now have cloned the repository and have a project to work on.

Install pre-commit

As mentioned above pre-commit refers to two things, primarily it is the Python package pre-commit that does all the work of running Linting, Tests etc. before making commits. It also refers to an R package precommit (note the omission of the hyphen -) that works with the Python package to enable use of various R packages that carry out such checks. Because it is a Python package it needs a Python Virtual Environment to run. This may sound unfamiliar but don’t worry the R precommit package and documentation guides you through doing so, what follows is a rehash of the official documentation.

Install precommit and reticulate

From RStudio install the remotes and reticulate package, then install the most recent version of precommit directly from GitHub.

install.packages(c("remotes", "reticulate"))
remotes::install_github("lorenzwalthert/precommit")

Install Miniconda environment

You can now use reticulate to install a Miniconda virtual environment framework for R to run Python packages (i.e. pre-commit).

options(timeout=600)
reticulate::install_miniconda()

Install pre-commit framework

This step now installs the Python package pre-commit within a new Miniconda virtual environment (by default r-precommit). There will be a fair bit of output here as all the dependencies in Python for pre-commit are downloaded.

precommit::install_precommit()
precommit::autoupdate()

Use precommit with the existing project

You should have cloned the repository you wish to enable precommit to use (see above). You now need to enable precommit for this local copy of the repository. This will place a script in .git/hooks/pre-commit that says which Miniconda environment to use (r-precommit) and will activate this whenever a commit is made, the install_hooks = TRUE ensures that the R specific hooks and their required environments are installed (under \~/.config/pre-commit/).

Make sure you have opened the .Rproj file in RStudio, this ensures you are within the project directory that you want to install precommit to (alternatively used setwd()).

precommit::use_precommit(install_hooks = TRUE)
No matching items

Reuse

Citation

BibTeX citation:
@online{shephard2023,
  author = {Shephard, Neil},
  title = {Pre-Commit and {R} {Packaging}},
  date = {2023-07-29},
  url = {https://blog.nshephard.dev/posts/pre-commit-r/},
  langid = {en}
}
For attribution, please cite this work as:
Shephard, Neil. 2023. “Pre-Commit and R Packaging.” July 29, 2023. https://blog.nshephard.dev/posts/pre-commit-r/.