Lab 8: Intro to SDMs

RMarkdown version of this Lab

See also Nan Nourn’s example code with expanded bioclim options for Lab 8 & Lab 9

In week 9 of the course you viewed recordings of the [GBIF Frontiers of Biodiversity Informatics and Modelling Species Distributions Symposium](https://www.amnh.org/research/center-for-biodiversity-conservation/convening-and-connecting/events-and-symposia/gbif-frontiers-panel-and-symposium] held at the American Museum of Natural History in 2015, which provided some examples of research involving data from GBIF and their use in SDMs. In this lab, you will be introduced to species distribution models (SDMs) using the R package, dismo: https://cran.r-project.org/web/packages/dismo/index.html. The package contains a great vignette and you’ll be working through the first part of it in this lab. Over the next two labs, you will continue working through the vignette.

For this lab, you will work through Part 1 (Chapters 1-4: Introduction, Data preparation, Absence & background points, Environmental data) of the dismo vignette. You can find the the vignette here: https://rspatial.org/raster/sdm/index.html.

First, work through the vignette Part 1 (Chapters 1-4). Note that if you want to use a species that does not occur in the dismo vignette bioclim variable range (Mexico, Central & South America), then you need to pull in environmental data from other regions. To do this please see the “Environmental Data” section of the additional code provided in Nan Nourn’s example code with expanded bioclim options for Lab 8 & Lab 9 - this is also linked in Lab 9.

Second, answer the dismo vignette Exercises 2.8 (#1-5; and copied below) using a species of your choice (not Solanum acaule; select a species with at least 50 occurrence records - it doesn’t have to be the African elephant). Note that Question 4 asks you to use gmap to make a basemap - this may not work because it requires a link with Google Maps. If gmap doesn’t work, use one of the various techniques you learned in previous labs for creating a map for the area that includes your species. Questions 6 & 7 are optional and will not be graded.

Here are the questions from the dismo vignette (modified slightly given that ‘geocode’ may not work):

Exercises 2.8

  1. Use the gbif function to download records for the African elephant (or another species of your preference, try to get one with between 10 and 100 records). Use option “geo=FALSE” to also get records with no (numerical) georeference.

  2. Summarize the data: how many records are there, how many have coordinates, how many records without coordinates have a textual georeference (locality description)?

  3. ‘geocode’ will probably not work because the it depends on a package that is linked with Google and hasn’t been updated, so skip the georeferencing task. Instead just use the subset of data that are already geocoded.

  4. Make a simple map of all the geocoded records. **See note above; gmap may not work so just use another basemap. For example, see Lab 4 which has ideas for how to map with tmap or ggmap.

  5. Do you think the observations are a reasonable representation of the distribution (and ecological niche) of the species?

6 & 7 are OPTIONAL:

  1. Use the ‘rasterize’ function to create a raster of the number of observations and make a map. Use “wrld_simpl” from the maptools package for country boundaries.

  2. Map the uncertainty associated with the georeferences. Some records in data returned by gbif have that.

Third, choose one of the types of SDMs from the list below - each student will select a different model, so indicate your choice soon on the Lab 8 Google Doc so that we don’t have duplicates.

Use the template below to generate a standard summary R Markdown-generated HTML file. This will help you get up to speed on next week’s topic (model types), and when posted on D2L, will enable all of you have a quick guide to different model types. Hint: scan forward in the dismo vignette for descriptions of some of these. Otherwise, check the papers uploaded for next week and Google Scholar. In addition to the dismo R package, some of these models are also run through the R package, biomod2: https://rdrr.io/cran/biomod2/man/BIOMOD_Modeling.html or flexsdm: https://sjevelazco.github.io/flexsdm/. Below is a list of SDM types. If you have another model in mind that is not on the list, ensure it is a SDM. For those of you handing in the assignment, pick one model each to write up.

generalized linear model (GLM)

generalized additive model (GAM)

Maxent

Bioclim (or surface range envelope)

GARP

boosted regression (decision) trees (BRT or GBM)

random forests

classification trees (CART)

mahalanobis distance

generalized dissimilarity models (GDM)

artificial neural network (ANN)

DOMAIN

Support Vector Machine (SVM)

Guassian process models

What to hand in as 2 separate files:

  1. You will need to hand in a PDF or HTML produced from R Markdown showing your work on the vignette with a different species, including any images, plots, code, and text for answers to Exercises 2.6 (#1-5).

  2. A separate HTML file produced from the R Markdown template below, describing one type of species distribution model. I will be posting these submissions for you all to see on D2L, so your SDM model summary will become a resource for the class.

Template for your summary of a Species Distribution Model

Follow the R markdown template below to include the following:

Model Name: Name the model type

Model Category: Regression/Machine Learning/Hybrid/etc.

Model Description: General description of how this model works.

Model Assumptions: Any assumptions you’d need to know about the model and about the data’s distribution, etc.

Model Response Data: A description of the types of data used in this model (Presence/absence, Presence-Only, Pseudo-Absence, Abundance, etc.)

Model Explanatory Data: A description of the types of data used as predictors/explanatory variables in this model (categorical okay? only continuous?)

Model Links and Use with R: an html link to R packages on the CRAN site that employ this model (with brief description), and links to any helpful websites that officially describe this model

Example Papers: list at least 2 full references for publications that use this model to generate a SDM.

Example with R: Provide annotated code to load the required package(s) to run the model.

library(dismo)     # package to run the model

Creative Commons License
This work is licensed under a Licensed under CC-BY 4.0 2020, 2022 by Phoebe Zarnetske.