Skip to content

Persistent config and data for R packages

Metadata

Highlights

  • Does your R package work best with some configuration? You probably want it to be easily found by your package. Does your R package download huge datasets that don’t change much on the provider side? Maybe you want to save the corresponding data somewhere persistent so that things will go faster during the next R session. In this blog post we shall explain how an R package developer can go about using and setting persistent configuration and data on the user’s machine.
  • “Applications can actually store user level configuration information, cached data, logs, etc. in the user’s home directory, and there is a standard way to do this [depending on the operating system].” R packages that are on CRAN cannot write to the home directory without getting confirmation from the user, but they can and should use standard locations. To find where those are, package developers can use the rappdirs package.
  • “Everyone likes it when you remember their name”. Everyone probably likes it too when the barista at their favourite coffee shop remembers their usual order. As an R package developer, what can you do for your R package to correctly assess user preferences and settings?
  • Using options In R, options allow the user to set and examine a variety of global options which affect the way in which R computes and displays its results. For instance, for the usethis package, the usethis.quiet option can control whether usethis is chatty. Users either:
  • Users can use a project-level or more global user-level .Rprofile. The use of a project-level .Rprofile overrides the user-level .Rprofile unless the project-level .Rprofile contains the following lines as mentioned in the blogdown book:
  • As a package developer in your code you can retrieve options by using getOption() whose second argument is a fallback for when the option hasn’t been set by the user. Note that an option can be any R object.
  • The use of options in the .Rprofile startup file is great for workflow packages like usethis, blogdown, etc., but shouldn’t be used for, say, arguments influencing the results of a statistical function.
  • Using environment variables
  • Environment variables, found via Sys.getenv() rather than getOption(), are often used for storing secrets (like GITHUB_PAT for the gh package) or the path to secrets on disk (like TWITTER_PAT for rtweet), or not secrets (e.g. the browser to use for chromote).
  • Using credential stores for secrets Although say API keys are often stored in .Renviron, they could also be stored in a standard and more secure location depending on the operating system. The keyring package allows to interact with such credential stores. You could either take it on as a dependency like e.g. gh, or recommend the user of your package to use keyring and to add a line like
  • Using a config file The batchtools package expect its users to setup a config file somewhere if they don’t want to use the defaults. That somewhere can be several locations, as explained in the batchtools::findConfFile() manual page. Two of the possibilities are rappdirs::user_config_dir(“batchtools”, expand = FALSE) and rappdirs::site_config_dir(“batchtools”) which refer to standard locations that are different depending on the operating system.
  • The golem package offers its users the possibility to use a config file based on the config package.
  • In particular, for the email address, if the R environment variable EMAIL isn’t set, whoami uses a call to git to find Git’s global configuration. Similarly, the gert package can find and return Git’s preferences via gert::git_config_global(). In these cases where packages guess something, their guessing is based on the use of standard locations for such information on different operating systems. Unsurprisingly, in the next section, we’ll recommend using such standard locations when caching data.
  • To quote Android developers guide again, “Persist as much relevant and fresh data as possible.”. A package that exemplifies doing so is getlandsat that downloads “Landsat 8 data from AWS public data sets” from the web. The first time the user downloads an image, the result is cached so next time no query needs to be made. A very nice aspect of getlandsat is its providing cache management functions
  • If you hesitate to use e.g. rappdirs::user_cache_dir() vs rappdirs::user_data_dir(), use a GitHub code search.
  • rappdirs or not To use an app directory from within your package you can use rappdirs as mentioned earlier, but also other tools. * Package developers might also like the hoardr package that basically creates an R6 object building on rappdirs with a few more methods (directory creation, deletion).
  • In this blog post we presented ways of saving configuration options and data in a not so temporary way in R packages. We mentioned R startup files (options in .Rprofile and secrets in .Renviron, the startup package); the rappdirs and hoardr packages as well as an exciting similar feature in R devel; the keyring package. Writing in the user home directory can be viewed as invasive (and can trigger CRAN archival), hence there is a need for a good package design (asking for confirmation; providing cache management functions like getlandsat does) and documentation for transparency. Do you use any form of caching on disk with a default location in one of your packages? Do you know where your rhub email token lives?