See also Nan Nourn’s example code with expanded bioclim options for Lab 8 & Lab 9
This lab is a continuation of species distribution models (SDMs)
using the R package, dismo
: https://cran.r-project.org/web/packages/dismo/index.html.
The package contains a great vignette and you worked through the first
part of it in Lab 8. You should complete Lab 8’s portion of the vignette
with your species of choice before starting on Lab 9.
For this lab, you will work through Part 2 (Chapters 5-7: Model fitting, prediction, and evaluation) of the dismo vignette. You can find the vignette here: https://rspatial.org/raster/sdm/index.html.
Note that if you want to use a species that does not occur in the dismo vignette bioclim variable range (Mexico, Central & South America), then you need to pull in environmental data from other regions. To do this please see the “Environmental Data” section of the additional code provided in Nan Nourn’s example code with expanded bioclim options for Lab 8 & Lab 9 - this is also linked in Lab 8.
In Ch. 4 of the vignette, you were introduced to environmental
predictors, and specifically bioclimatic variables. The dismo vignette
reads in WorldClim bioclimatic
variables. Worldclim interpolates temperature and precipitation from
weather stations to create a modeled representation of these data. Note
that you can create bioclimatic variables from any gridded climate data
source using the command biovars
. For example, you could
use satellite remote sensing (direct measures instead of interpolated) -
NASA’s MODIS land
surface temperature and NASA’s GPM
(global precipitation measurement). Or you could use PRISM data if you’re working
in the United States.
The 19 Bioclimatic Variables are listed below. Metadata for these data can be found here.
UNITS: the units for temperature are an order of magnitude larger: °C x 10, and ‘mm’ is the unit for precipitation.
BIO1 = Annual Mean Temperature
BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))
BIO3 = Isothermality (BIO2/BIO7) (×100)
BIO4 = Temperature Seasonality (standard deviation ×100)
BIO5 = Max Temperature of Warmest Month
BIO6 = Min Temperature of Coldest Month
BIO7 = Temperature Annual Range (BIO5-BIO6)
BIO8 = Mean Temperature of Wettest Quarter
BIO9 = Mean Temperature of Driest Quarter
BIO10 = Mean Temperature of Warmest Quarter
BIO11 = Mean Temperature of Coldest Quarter
BIO12 = Annual Precipitation
BIO13 = Precipitation of Wettest Month
BIO14 = Precipitation of Driest Month
BIO15 = Precipitation Seasonality (Coefficient of Variation)
BIO16 = Precipitation of Wettest Quarter
BIO17 = Precipitation of Driest Quarter
BIO18 = Precipitation of Warmest Quarter
BIO19 = Precipitation of Coldest Quarter
Work through Part 2 (Ch. 5-7: Model fitting, prediction, and evaluation) of the vignette with the example dataset. Choose a different species than the one provided in the vignette (different than the Bradypus species; it would be easiest to select the species you chose in Lab 8 since you already completed that section). Note that the extraction of the environmental predictors occurs in Chapter 4 of the vignette, so if you choose a different species than you used in Lab 8, you will need to go back to earlier portions of the vignette. Re-create Ch. 5-7 of the vignette with your species of interest, constraining the region of your study area to your species known range (ensure that you remove outliers from zoos, incorrect locations by looking up the known range from an independent source).
For more background information on AUC and Spatial Sorting Bias, see the additional text below the QUESTIONS.
NOTE 1: A reminder that if you want to use data from WorldClim outside the dismo vignette data, you should download it; see the “Environmental Data” section of the additional code provided in Nan Nourn’s example code with expanded bioclim options for Lab 8 & Lab 9 - this is also linked in Lab 8.
It is possible that the dismo vignette is using older versions of WorldClim data (version 1.4). These older (but not updated) data can be downloaded directly using getData. The version 1.4 data are outdated so you wouldn’t want to publish anything with them. See above for how to obtain the updated data (2.1). For the lab exercises, it’s ok to use the 1.4 version; you can download 1.4 version data from another area like this:
# Create the folders (directories) "data" and "lab9" - If they exist already, this command won't over-write them.
library(raster)
# Use the getData command
?getData
# World-wide, all bioclim variables, 10 minutes of a degree resolution
w_data_world<-getData('worldclim', var='bio', res=10)
## Warning in getData("worldclim", var = "bio", res = 10): getData will be removed in a future version of raster
## . Please use the geodata package instead
plot(w_data_world)
# Region-specific (lon and lat are centered on the area); 0.5 minutes of a degree resolution
w_data_europe<-getData('worldclim', var='bio', res=0.5, lon=5, lat=45)
## Warning in getData("worldclim", var = "bio", res = 0.5, lon = 5, lat = 45): getData will be removed in a future version of raster
## . Please use the geodata package instead
plot(w_data_europe)
NOTE 2: In Ch. 5-7 of the vignette, when creating the reduced model with a subset of bioclimatic variables (e.g., bio1, bio5, bio12), you can choose to select different bioclimatic variables - see the list of bioclimatic variables above. You should choose a different subset only if the results of your full model (containing the fuller set of bioclimatic variables) suggest a different set, or if you have a priori knowledge of the species-climate relationship.
Show your work (code and output) for this portion of the vignette (Chapters 5-7). You can either add on to Lab 8’s .Rmd file of the vignette that you already produced to create a longer PDF or HTML, or you can hand in a new PDF or HTML with just Lab 9’s section included. Either way, please hand in a PDF or HTML produced from your .Rmd file to show all your work for Ch. 5-7.
After completing the vignette with your species of choice, answer the following QUESTIONS:
Describe the differences you observe in the mapped prediction between the full model, and the reduced model. Which mapped prediction is closer to the known distribution of the species? Provide a source and image (e.g., screenshot) of an independent source showing the range map of the species.
Which model, the full or reduced model, performs better? Why? Include plots and model statistics to help with your explanation.
Extra information on AUC and Spatial Sorting Bias
(SSB): In general AUC improves with larger geographic extents.
As with most of these metrics, you should only compare them across
models that have the same input data (meaning the same records for
occurrences; same number of rows of unique presence and absence
locations). Splitting data into training and testing makes the model
susceptible to being influenced by the actual distances between the
testing and training data in space. To remove this bias, you can subset
the data into training and testing with pairwise distance sampling
(pwdSample
in dismo
package).
Type the following to see the full details on the
pwdSample
command:
library(dismo)
?pwdSample
This
work is licensed under a
Licensed
under CC-BY 4.0 2020, 2022 by Phoebe Zarnetske.