Chapter 3 Configuration
Some details about your code depends on who is running it, or where you are running it, or requires personal secrets such as API keys or passwords. If details are hard-coded into your functions or scripts, it will be hard to share your scripts with collaborators but also for reproducibility.
One constant problem is scripts need to read files from somewhere, and everyone’s computer has different paths, or the paths differ when run in a different environment (e.g. on a high performance computer). We often see scripts with the configuration variables at the top of the script, and you have to edit those to your environment, or comment them out. But then those go into the git repository, so collaborators have to change them back. Even for your own scripts, you may want to change a file path depending on if you are running on your laptop or on the HPC for example. It’s hard to track down all the places where you have to change these parameters (although global search/replace helps). But it would be better if there was a single, consistent place to put all configuration values shared by scripts in a project
3.0.1 Solution
R has a built in feature exactly for this purpose - automaticaly reading a file called .Renviron
. Using files that start with “.” (so-called dot-files) for configuration is common in Linux (and these files are hidden). Note There are a couple of libraries to make this a bit more slick, but this document describes only features that are part of Base R. We can add description of how to incorporate those below
3.0.1.1 About Environment Variables
- Not a bad summary: https://en.wikipedia.org/wiki/Environment_variable
- (Mac/Linux) https://dev.to/gajesh/environment-variables-in-linux-and-mac-os-4j30
References for learning about Renviron and the R start-up process:
- Good summary of .Renviron : https://www.dartistics.com/renviron.html
- Within R/Rstudio, see the help topic ?base::Startup
- Jenny Bryan’s chapter on R start-up from her book https://rstats.wtf
Note that there is also a package called “startup” on CRAN which is an enhancement and not part of base R.
.Renviron is for setting values for script configuration. It uses the Operating System Environment to store these (e.g. not the same as an R environment), e.g. “environment variables” Any program can read or set an environment variable. One commonly used variable is $HOME which is your home directory.
3.1 Quick Points on Configuration
when R starts it looks for two files
.Renviron
set of environment variables listed in here. Written as shell script: use ‘=’ for assignment with no spaces, e.g. PATH=/home/me/data
.Rprofile
options for how you like to use R. Written as R code, use ‘<-’ or’=’ for assignment
If these files are in the local directory, it reads those and stops.
If not it looks for them in your home directory (global settings). It only reads one version (e.g. it doesn’t read local then also read global settings)
3.2 Using Renviron
Example .Renviron file for you or one of your collaborators. Each person makes their own copy of .Renviron. Note you can use existing operating system variables in this file. This example works for Linux/Mac. Note that this is not R code, and no spaces are allowed around the =
equals signs
# .Renviron file for project X
API_KEY=abc123
OUTPUT_PATH="$HOME/Documents/NEONOrganism/DataPull"
inside your R script or function (this does not need to be edited )
Sys.getenv('OUTPUT_PATH')
output_path <- Sys.getenv('API_KEY') api_key <-
Thie code would be placed in your scripts prior to calling functions that needed these values. For example a function that needed the output path
function(x,y, output_path){
saveStuff <-
# do stuff with x,y
# ...etc
file.path(output_path,"myoutput.csv")
output_file <-write.csv(x = data.frame(x,y), file=output_file)
return(output_file)
# ... etc
}
The main script to run then would
rnorm(100)
x <- rnorm(100)
y <-
# get the value of the output_path from the OS environment
Sys.getenv('OUTPUT_PATH')
output_path <-
# use the value of the output path without hard-coding a value that works only for you
saveStuff(x, y, output_path) saved_file <-
If your code requires configuration values and you use an .Renviron file, you note this in your project’s README.md file, and should provide an example of values, or perhaps a file like example.Renviron
Suggested Workflow to use lab code that needs configuration in Renviron:
- clone the repository
- in the README.md look to see which configuration settings are needed
and/or looked for an example file Renviron text file.
- create a new .Renviron file in the top level folder of the repository with the necessary configuration
- open a new Rstudio project in that folder (or start R) which would then read the .Renviron settings
- run the code
- repeat for different environment (e.g. laptop vs HPCC)
workflow when using configuration in your code with Renviron:
- put configuration in .Renviron and add Sys.getenv() calls to set those vars in R
- add those configuration vars in Renivron_example.txt with comments to describe possible values
- add a list of the configuration need in the README.md
- add .Renviron to your .gitignore file to keep it out of github
An optional but polite thing to do for other users of your code is to check for these values If the variable is not set, Sys.genenv
returns Null.
Can you set a reasonable default if it’s not set in Renviron? If not, send a me You could test that the value the user supplied (or default) actually work 4) throw exception if they don’t and print message which value needs to be set and where to look for guidance (see README)
A good example of this is the rredlist package from Bioconductor that pulls data from the IUCN Redlist API. This requires an API key, which the package suggested putting into .Renviron
and there is a function that checks if the key provided is Null, or if it’s in Renviron. See
https://github.com/ropensci/rredlist/blob/master/R/zzz.R#L36
Note the .Rprofile should be used to customize your R experience, not set configuration for a particular set of scripts. See https://www.statmethods.net/interface/customizing.html
3.3 File Paths STandard Configuration
TO BE WRITTEN
3.4 R Packages for Configuration
These other packages/features work to make configuration easier, or could be part of a configuration strategy. They are not necessary but may make it easier if your project’s configuration is complex.
startup: https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html
Package from Henrik Bengtsson to make it easier to work with .Renviron and Rprofile. Don’t know how well is plays with config
config: https://cran.r-project.org/web/packages/config/vignettes/introduction.html
Package from Rstudio to make it easy to have different Environments so you can easily switch out configuration, for example the paths/data connections you’d use for testing would be different for local and from HPCC. This is especially important when working with “production” or shared databases as you want to develop/test code on a local copy before running it on the central database or one used by an active web application
dotenv: older package that borrows from python. Does not use Renviron but a “dot-env” or “.env” file just like Renviron. I think using Renviron makes more sense but mention this as it comes up when searching. A package like this is required in Python because there is nothing built-in to handle config.
options: https://stat.ethz.ch/R-manual/R-devel/library/base/html/options.html
part of base R and a way to set options globally so you don’t have to send them as parameter to every function. This is complimentary to Renviron: set config values in Renviron, use the options() function to set global options var in R based on those env variables.