Lab 1: R, Data Management, ggplot2, Stommel Diagram

Download RMarkdown Source code for this lab
Refer to the Welcome page of this website for the “Requirements” for working on each lab (run any script from your course folder).

This lab has 3 Parts in addition to your participation in the Lab Community Forum for this lab (link to the Google Shared Drive on D2L in the description for “LABS”). Note that you only need to hand in Part 3 (Stommel Diagram), due in the D2L Lab 1 Assignment Box by the Lab Due Date (refer to the Syllabus for exact date).

At the end of Lab 1 is Homework in preparation for Lab 2.

Part 1: R & Data Management

R (https://www.r-project.org/) is an elegant scripting language. With R you can do just about anything to analyze or plot your data. You can even do things without data (e.g., theoretical modeling). One of the best things about R is that scientists of all sorts contribute to it by providing R packages (i.e., sets of code) that focus on a set of analyses (like time series analysis, spatial analysis, etc.). And it’s free. So chances are, if you are looking to do something with your data, there’s at least one package out there that can do it! Sometimes it might take time to find it, but using a variety of keywords has always turned up some portion of code that is useable for a data application I’ve needed. And if you don’t find what you’re looking for, consider writing a package as a contribution.

R RESOURCES There are numerous tutorials, books, short courses, etc. out there to help you learn R. See the “Resources” section in the course D2L site for just a few relevant links to help you beyond this course. There are also resources posted on the MSU SpaCE Lab R Guide website at: https://space-lab-msu.github.io/r_guide/.
The exercises below draw on bits and pieces from some of these sources, but also some aspects that I think are particularly important to understand about R. Even if you’ve worked with R before, you may find some of these descriptions and codes useful. I’m always learning new things in R from a variety of sources - from my students, postdocs, and colleagues, and from websites, social media, scientific papers, and conferences.

FAIR Principles

Data should be Findable, Accessible, Interoperable, and Reusable (FAIR). Read Wilkinson et al. (2016) to become familiar with these standards, as we will be referring to them throughout the course.

Wilkinson, M. D., M. Dumontier, Ij. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. C. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3:160018. DOI:10.1038/sdata.2016.18.

You can also review the FAIR Principles at this website: https://www.go-fair.org/fair-principles/

DATA MANAGEMENT is key to your success as a scientist or practitioner.

One of the most important (and time consuming) steps with your data is organizing it. The readings for this lab will focus on that. Rather than work in Excel or another point and click program, R will enable you to track your changes from the very beginning. This is extremely important because chances are, you’ll need to re-run your scripts at some point during your analysis because you’ve realized you made some mistake along the way. It also vastly reduces human error because it provides you with a track record for reproducible research.

Please make sure you have read Borer et al. (2009) and White et al. (2013) and are familiar with proper data entry and management. If you follow these guidelines I guarantee you’ll be more efficient. For more in-depth information, see this publication from the British Ecological Society: “A Guide to Data Management in Ecology and Evolution”: https://www.britishecologicalsociety.org/wp-content/uploads/Publ_Data-Management-Booklet.pdf. The papers are on D2L, as are links to other data management resources in our Resources folder.

Elizabeth T. Borer, Eric W. Seabloom, Matthew B. Jones, and Mark Schildhauer 2009. Some Simple Guidelines for Effective Data Management. Bulletin of the Ecological Society of America 90:205–214. http://dx.doi.org/10.1890/0012-9623-90.2.205

White, E., E. Baldridge, Z. T. Brym, K. J. Locey, D. J. McGlinn, and S. R. Supp. 2013. Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution 6. https://ojs.library.queensu.ca/index.php/IEE/article/view/4608

R & RSTUDIO:

See the main page for this website to learn more about downloading R (https://www.r-project.org/) and RStudio (https://www.rstudio.com/). Please make sure you’re using the most recent version of R. R developers have funny names for R versions. Can you figure out their theme? You will use RStudio for lab (which runs R and has a nice layout… it also contains R Markdown which we will use for generating Lab Assignment reports; see related homework ahead of Lab 2 at the end of this document).

Some helpful tips when starting up R:

# Clear all existing data
rm(list=ls())

# Close graphics devices
graphics.off()

# Check the directory you're working in. If it's not your main lab course folder, 
# make sure the .R or .Rmd file you're working on is moved to that location (you will 
# have to close the .R or .Rmd file if you've opened it in RStudio so it is moved correctly). 
getwd()

For purely reproducible research, it’s not recommended to set your working directory (“setwd()”). This is because any code you write should be able to run from any computer, reading in data from a raw data folder housed on some server or cloud location, potentially through a data repository. For reproducible research you would not want to store the data on your personal computer. However, working from your personal computer while preparing data analysis ahead of publication, or working on these Labs is okay (just make sure you back it up regularly).

Get in the habit of removing any spaces in your pathnames and filenames. Use “_” instead. No special characters; just ASCII characters (American Standard Code for Information Interchange).

Set paths with file.path() to tell R the location where the data exist (these directories should already exist - see the Welcome page of this website). file.path() creates reproducible code that can be run on multiple operating system because Mac and Windows have different directory separators. The code below assumes the folders “data/lab1” [“lab1” is a folder inside “data”] and “output” already exist where this script is run.

# Set paths
data_path<-file.path("data","lab1")
output_path<-"output"

# Install and load appropriate packages used in your script.
# In this case install ggplot2 package then load it - you will need to comment out 
# (i.e., remove the "#") before running install.packages() below.

#   install.packages('ggplot2',repos = "http://cran.us.r-project.org") 

# You may be required to pick a source – pick one near where you are located
# Load ggplot2: 'https://ggplot2.tidyverse.org/reference/'
library(ggplot2) 

# If you install several packages at once, you can use code like this to 
# install and load them:

for (package in c("dplyr", "ggplot2", "labdsv")) {
  if (!require(package, character.only=T, quietly=T)) {
    install.packages(package)
    library(package, character.only=T)
  }
}
# *hint: if you're using R Markdown to knit with these commands, it may
# be helpful to install the package locally first (run this before knitting if you're 
# knitting in R Markdown - more on that in Lab 2)

If you use ggplot2, you may want to change the default settings for plotting (which is a gray background). I usually make these changes below. There are also options for adjusting the orientation of text on the axis labels. See the ggplot2 theme documentation: https://ggplot2.tidyverse.org/reference/theme.html for how to make your own theme.

theme_set(theme_bw(10))
theme_update(axis.text.x = element_text(size = 10),axis.text.y = element_text(size = 10))

Managing your workspace

# As you go, save your workspace
save.image(file.path(data_path,"lab1.RData")) 
# Load the workspace (e.g. if you start back up but don’t want to re-run 
# everything you already worked on)
load(file.path(data_path,"lab1.RData"))

As you go, save your R script. (Refer to shortcut in File-Save menu).

Part 2: Practice with R: data manipulation, and ggplot2

You don’t need to hand in anything for this portion of the lab.

Ahead of the course, I recommended working through this R tutorial: https://datacarpentry.org/R-ecology-lesson/index.html. Take some time to look through it if you haven’t already. You won’t be able to finish the whole tutorial in one lab, but if it looks unfamiliar to you, I recommend working through those sections in the early part of the semester (you can skip the SQL section). Note that some of the later lessons require data merged in previous sections and all the original data is found via a link on the first page. While I don’t require you to hand anything in, the tutorial is a good reference especially for data manipulation and ggplot2.

Part 3: Stommel Diagram

In lecture for Week 2 we go over the history of scale as it relates to ecology and natural science. For this portion of your lab, create a Stommel Diagram for the system you study (or some component of the system you study). It doesn’t need to be drawn in 3D as in Stommel’s original diagram. It can be 2D and look more like the Boreal Forest Ecosystem Diagram.

Label your axes and place approximate values on your tick-marks. X-axis=space, Y-axis=time. Pick at least one response variable to plot as your Z-axis. Label the phenomena associated with your response variable. Use the examples from lecture as a guide.

On your diagram, include a depiction of data sources as in Kaiser et al. 2005:

Plot 5:

You can see that the ships cover a narrower range within the satellite range but also overlap the finer scales of the moorings).

Write a paragraph describing your diagram and pointing out key relationships. Include relevant in-line citations and a list of references. If you hand-draw the diagram, scan it in or take a photo of it to make it digital and embed it in your Word document, PDF, or PowerPoint along with your written description. Submit a single file to the D2L Assignment Box for Lab 1. Feel free to use the Lab Community Forum for discussions about your Stommel diagrams as you prepare them.

Homework in preparation for Lab 2:

R MARKDOWN: R Markdown is an authoring framework for R code and output. It’s handy because it produces nice summaries of your work in HTML or PDF or other formats. This website was made in Markdown via RStudio. a. Watch the intro video here: http://rmarkdown.rstudio.com/lesson-1.html b. Work through the tutorial and cover “How it Works”, “Code Chunks”, and “Markdown Basics”. I recommend using “KnitR” when you want to publish (i.e., click on “knit to HTML”, “knit to PDF”, etc. in the pull-down menu in RStudio).

Creative Commons License
This work is licensed under a Licensed under CC-BY 4.0 2020, 2022 by Phoebe Zarnetske.