1  Computer setup and first R script

1.1 Goals

  • Learn about your computer and how to organize its files.
  • Install R, an open source programming language.
  • Begin to use R effectively.

1.2 Background

In this project, you will begin to use tools for data exploration and analysis. This tools and your computer are really important for helping you make sense of our really complicated world. Ecology is way more complicated than physics or engineering, and the data are noisy and often unpredictable. We need to use computers to make sense of it.

We will use open source and free software to help us understand ecological systems. You will download and install R, known formally as the R programming language and environment. You will also install RStudio, an integrated development environment (IDE), designed to be used with R. The R interface looks a little different on Windows, Macs, and Linux computers. RStudio will allow us all share an identical appearance and user experience. You will use R by opening RStudio.

1.2.1 Understanding you computer

First, we need to make sure we know some basics about computers.

1.2.2 Folders, directories, and directory structure

On your computer,

  • a folder is a place to put files and other folders,
  • “a directory is a file system cataloging structure that contains references to other computer files, and possibly other directories.” (Wikipedia)
  • A folder is a part of your computer’s directory system.

If you have not already done so, you need to organize your computer.

Do these:

  • Make different folders (or directories) for different classes.
  • Make different folders for different kinds of content (e.g., data, articles, your writing).

In addition,

  • do not use the Desktop or Downloads to store anything important, and
  • do not mix personal content and course content.

Watch one of these to help you understand how your computer is organized. This will save you hours of time and frustration throughout your life.

1.2.3 File types and file name extensions

File extensions are little tags at the end of file names taht identify the type of file it is. You will need to be able to see file extensions on your computer. A file extension is the little label at the end of a file name that comes after the dot, such as

  • .jpg or .jpeg is the extension for JPEG image files.
  • .docx is the extension for Microsoft Word document files.
  • .txt is the extension for plain text files.
  • .xlsx is the extension for Microsoft Excel files. You will work with these.
  • .csv is the extension for Comma Separated Value text files - these can be produced by spreadsheet applications. You will work with these.
  • .R is the extension for R scripts. You will work with these.

Note that, by default, Windows hides these from the user. Here is how we fix that on Windows. (Mac users can skip this.)

On Windows 10, you need to make changes to see file extensions by revealing “hidden items”. Windows 10:

  1. Open File Explorer; if you do not have an icon for this in the task bar; click Start, click Windows System, and then File Explorer.

Finding File Explorer on Windows.

  1. Click the View tab in File Explorer

View tab in File Explorer

  1. Click the box next to File name extensions to see file extensions.
  2. Click the box next to Hidden items to see hidden files.

You should now be able to see file name extensions (e.g., .docx, .R) and hidden files.

1.3 Installing R

Armed with a little knowledge of your computer, you now install R.

1.3.1 What and Why

We will use the Open Source R programming language and environment for doing science, including,

  • exploring data,
  • testing ideas, and
  • modeling dynamical systems like populations of rare and endangered species, disease dynamics, and ecosystems.

You need to download the application to get started.

  • R is free.
  • R runs on all operating systems.
  • R is powerful.

1.3.2 How-to videos (optional)

The following videos might be useful, but they are not required! The next sections describe (in words) how to install R and RStudio.

Optional videos:

  1. Installing R and RStudio
  2. Set up your computer and start to use R via RStudio
  3. Saving figures made with ggplot()
  4. Introduction to ggplot2 (20 min)

1.3.3 How-to instructions

  1. Find out what version of operating system (OS) your computer is using (e.g., Windows 10 or 11, or Linux Ubuntu, or Mac Sonoma, or other recent editions). This will help determine which version of R to use. If you are using a Mac, this also requires that you determine which version of processor or chip your computer uses (Intel processor vs. Apple M1 chip or higher). Check the processor or chip under “About this Mac” in the Apple icon menu in the upper left corner of your screen.
  2. Google R (just the letter “R”) to navigate to the R Project for Statistical Computing.
  3. Find the link in the left Task Bar for CRAN (Comprehensive R Archive Network).
  4. Find and navigate to a CRAN site near you (e.g., United States or Midwest US or Ohio). These are called mirrors because they are copies of each other.
  5. Download and install the latest release of R for your operating system (Mac, Windows, or Linux).
  6. As of January 2023, it was version 4.2.2. It is a single file to download. You should use whatever the latest version is.
  7. In January 2023, it was worked like this:
    1. Mac - select R-4.2.2.pkg (for the Intel chip) or R-4.2.2.arm64.pkg (for the Apple M1 chip)
    2. Windows - select the “base” package, then R 4.2.2
    3. Linux - select your flavor, and follow instructions.

1.3.4 Where

  • R (and RStudio) will be installed along side all your other applications.

1.3.5 Installing RStudio

RStudio is an integrated development environment (IDE), designed to be used with R. The R interface looks a little different on Windows, Macs, and Linux computers. RStudio will allow us all share an identical appearance and user experience.

  1. Navigate online to the RStudio website.
  2. Under Products find RStudio IDE.
  3. Select and download the Open Source Edition (RStudio Desktop). You want the desktop version (not the server version).
  4. Follow instructions to install on your computer.

1.3.6 R \(\neq\) RStudio

RStudio is an interface to R. When we use R in this class, we will use RStudio as our interface, but you need to realize that we are primarily using R. We don’t need RStudio to do the work, but it makes my job as instructor easier, and makes it a little easier for you as well. So, when you start to use R, just open RStudio, and it will load R for you.

In this class, you will use the English language to write about ecology, and the R language to explore and analyze ecological data.

1.4 Set up two folders for this class

  1. Set up a folder for this class. Call it “BIO209W”. Do not name it “BIO_209W” or “Bio209W” or “ecology”. Just “BIO209W”. It will help us. I don’t really care where you put this folder, but you should have a place for all of your classes, and you could put it there. Don’t put it in “Downloads”.

  2. Next, set up a subfolder inside BIO209W called “Rwork”. Do not call it “RWORK” or “R_work” or anything else - just “Rwork”. It will help us. A lot. Rwork will be your working directory, which is where R looks automatically for data and where it automatically puts output.

1.5 Working in R

To work in R in this class, we will open RStudio. It will open R for you.

1.5.1 Set your “working directory”

In this section, we learn what the “working directory” is and how to set it.

Your working directory is where R looks to find stuff by default. R can look anywhere you tell it to, but it looks automatically in your “working directory”.

In this class, you should always use “Rwork” as your working directory.

I can find out where my working directory is by using getwd() to get my working directory:

getwd()
[1] "/Users/stevenmh/Courses/intro_to_ecology"

This tells me what R is using as my working directory right now. My current working directory will differ from yours.

You should run this code, and see what you get. What do you get when you run this code (getwd())?

Next, use the “Session” in pulldown menu in RStudio to Set Working Directory and then Choose your working directory and always choose Rwork (which should be inside BIO209W).

1.5.2 Create a project

RStudio allows you to organize your work in projects. Here we create a new project that we will call Rwork.

This requires that you already have a folder called “BIO209W” and a subfolder in that called “Rwork”.

  1. In RStudio, use the File menu to select “New Project”, and select “Existing Directory”.
  2. In that dialogue, navigate to “Rwork”, and select “Open.”
  3. Select “Create Project”.

RStudio will now shift the working directory to your Rwork directory. You can see evidence of this: in the upper right pane which should have Rwork identified in the upper right corner, or in the lower right pane in the Files tab.

From now on in this course, make sure that you are using this project. You can open R by clicking the “Rwork.proj” file in your Rwork directory, or opening RStudio, and selecting the Rwork project in the upper right corner.

1.5.3 Scripts: Start and save a script

Goal: Write a script in R and learn several important programming operations.

After you have prepared your computer as above, you are ready to begin exploring and modeling data. We do that using scripts.

A script is a plain text file that contains the code and and your own comments that you use to address your questions.

The following steps will let you accomplish the goal of this section. The video above showed me opening R with RStudio, and starting a new script.

Writing comments Anything following a hashtag (#) is a comment and R ignores it. You and I will use lots of comments to ask questions of ourselves, and to describe what we are trying to do.

  • Use the File menu in RStudio to start a New File, specifically, an R Script. This will open a file in the upper left of RStudio.
  • write the following in the new script, and include the hastags.
## My first R script in BIO 209W Spring 2023
## [add your name]

ALL OF THE WORK YOU DO SHOULD BE IN A SCRIPT.

1.5.4 Entering and running code

To enter code, place your cursor on any line in the script, and type, like this.

-1:5

To run code, simply place you cursor anywhere on a line of code and hit Control-Enter or Control-Return or Command-return (Mac).

Try it with this:

-1:5
[1] -1  0  1  2  3  4  5

This will pop up in the Console, which is the lower left pane in RStudio.

What did this create? Do you get what I got above?

1.5.5 Assignment operator

W?hen we create something, we typically want to assign it to a labeled object. In this class, we will use an arrow to indicate that we are putting a creation into an object. We create the arrow on our keyboard using shift-comma and a dash (‘less than’ and a dash)

a <- -1:5

This puts the series of integers from -1 to 5 into the object a.

1.5.6 Examining objects

If we have an object, like a, how can we figure out what it is? There are several ways of doing that. The simplest is to “print” it out. This doesn’t mean that we print it on paper. Rather we are printing it to the console, like this:

# show or 'print' a
a
[1] -1  0  1  2  3  4  5

Another way is to look at its “structure”:

str(a)
 int [1:7] -1 0 1 2 3 4 5

This tells us that this is a vector of integers that is 7 elements long.

Another way is to look in the RStudio Environment pane in the upper right hand side of RStudio.

1.5.7 Things you’ll find out about R

If you are not used to writing code (and nobody is born that way), here are some thing to keep in mind about R and about coding in general.

  1. Computers do exactly what you tell them. This is both frustrating and also satisfying.
  2. R does not know anything or do anything until you run the code. Typing or pasting code into a script is just the first step; you also need to run the code, with Control-return (Windows or Mac) or Command-return (Mac).
  3. A corollary of no. 5 is that you usually need to run lines of code in sequence. For instance, R won’t be able to run str(a) unless you previously defined a, such as a <- -1:5 (this is because R is an interpreted language - R interprets it one line at a time).
  4. Case matters in R. R interprets a as something completely different than A.
  5. Spaces between things usually don’t matter in R. R interprets 2+3 the same as 2 + 3.
  6. Space within things usually breaks them, just the way “thin” “gs” is not the same as “things”. For example, the pseudo-arrow we use in f <- 9 is a thing. The compound symbol <- means something different than < -.
    1. <- is the assignment operator and assigns values to objects.
    2. < - means “less than, minus”, where 2 < -3 tests whether 2 is less than minus 3 (it isn’t).

Try it.

f <- 9
f
[1] 9

Now f has the value of 9.

Try this:

2 < -3
[1] FALSE

That tested whether 2 is less than -3.

Moving on….

Now let’s create a set of 7 uniformly distributed random numbers that are between 0 and 1.

b <- runif(7, min=0, max=1)
b
[1] 0.27906864 0.01167553 0.40677881 0.40230932 0.32470805 0.59276515 0.22565336

These will differ every time you do this.

Now we multiply each element in a times the corresponding elements in b.

# multiply a and b
ab <- a * b
## Show a, b, and ab
a
[1] -1  0  1  2  3  4  5
b
[1] 0.27906864 0.01167553 0.40677881 0.40230932 0.32470805 0.59276515 0.22565336
ab
[1] -0.2790686  0.0000000  0.4067788  0.8046186  0.9741242  2.3710606  1.1282668

You can confirm at a glance that the first and second elements in a times the first and second elements in b equal the first and second elements in ab.

1.5.8 Combining vectors into a data frame

We will use lots of data frames in this course. Think of a data frame as a kind of spreadsheet of data, in which every column has the same number of entries and every row is an observation comprising all of the variables across all the columns.

Here is one way to create a data frame. We assign variables using “=”.

d <- data.frame(a=a, b=b, a_times_b = ab)
d
   a          b  a_times_b
1 -1 0.27906864 -0.2790686
2  0 0.01167553  0.0000000
3  1 0.40677881  0.4067788
4  2 0.40230932  0.8046186
5  3 0.32470805  0.9741242
6  4 0.59276515  2.3710606
7  5 0.22565336  1.1282668

You could examine the structure of this data frame with str().

str(d)
'data.frame':   7 obs. of  3 variables:
 $ a        : int  -1 0 1 2 3 4 5
 $ b        : num  0.2791 0.0117 0.4068 0.4023 0.3247 ...
 $ a_times_b: num  -0.279 0 0.407 0.805 0.974 ...

You will also notice now that you can see all of our objects in the Environment pane, in the upper right of RStudio.

You can export data frames from R as well, and programmers call that “writing” a file. The command below will save our data frame as .CSV files that you could open in a spreadsheet application.

# export or write a file
write.csv(x=d, file="myDataframe.csv", row.names=FALSE)
# including row.names=FALSE prevents R from adding row names.

This should cause the data frame d to get saved to your working directory. Did it work for you?

1.5.9 Plotting data

Now let’s plot some data, including both points and lines connecting them.

# type='p' is for points only
plot(a, ab, type='p') 

R is well known for its graphics capabilities, and we will only just scratch the surface in this course.

1.6 Install an R package

R is open source, and its strength derives in part from the thousands of scientists creating extensions or add-on packages to do particular tasks. By itself, R can do amazing things. With thousands of contributed packages (as they are known), R’s capabilities have exploded.

You can add a package in a few ways, including.

  • Use the Tools menu, select “install packages” and type the name of the package you want.
  • use a function, “install.packages(”[packagename]“), with quotes but without brackets.

In this class, you will need to install several packages. Let’s try installing “ggplot2”.

install.packages("ggplot2")

This will install ggplot2, and it will also install anything upon which ggplot2 depends. It will install all of its dependencies.

Once you have installed a package, it is on your computer for good. You do not have to install it again, although you may want to update it once in a while to stay current.

Now let’s use the plotting functions in the ggplot2 package.

1.7 Load and use an R package

To use a contributed package, we need to load it. This simply means waking it up, and putting its code in R’s working memeory. We do that with library().

library(ggplot2)

Here we use ggplot() and geom_smooth() to plot our data, and add a fitted curve.

When we use ggplot(), we tell it the name of our data frame and the variables we want to use go inside aes(), which stands for “aesthetics”. We also begin to add bells and whistles.

## plot data and fit a curve
ggplot(data=d, aes(x=a, y=a_times_b)) + 
  geom_line() +
  labs(x="a", y="Product of a and b")

Note that with ggplot, we use the plus sign (+) to add elements to our graph.

1.7.1 Saving your graphs

We can save this figure as well, in a variety of formats and sizes. By default, these are saved to our working directory, though we can change that if we want to.

ggsave("myPlot.png")
Saving 7 x 5 in image
ggsave("myPlot.jpg", width=7, height = 3)
ggsave("myPlot.pdf", width=3, height = 5)

1.8 Getting help

R has a bit of a learning curve. What should you do if you get stuck? It all depends what sort of problem it is.

  1. First, ask a classmate.
  2. Use the Help tab in the lower right pane in RStudio. (Similarly, type ?mean in the Console to get help on the function mean().)
  3. Ask another classmate.
  4. Ask me.
  5. Search online.
  6. Did I mention you could ask me?

1.9 Deliverables

For the “First R script” assignment on Canvas:

  1. Upload three files to the Canvas First R Script assignment:
    1. myPlot.png
    2. myDataframe.csv
    3. the R script, into which you wrote all your code. This file must be written in RStudio, and the name must end with the file name extension, “.R”, such as “Hanks_script.R”
  2. Explain your biggest surprise in this assignment.

Congratulations. C’est fini!