There were a relatively large number of extinctions of mammalian species roughly 10,000 years ago. To help understand why these extinctions happened scientists are interested in understanding whether there were differences in the body size of those species that went extinct and those that did not.
To address this question we can use the
largest dataset on mammalian body size in the world,
which has data on the mass of recently extinct mammals as well as extant mammals
(i.e., those that are still alive today). Take a look at the
metadata to
understand the structure of the data. One key thing to remember is that species
can occur on more than one continent, and if they do then they will occur more
than once in this dataset. Also let’s ignore species that went extinct in the
very recent past (designated by the word "historical"
in the status
column).
Import the data into R. If you’ve looked at a lot of data you’ll realize
that this dataset is tab delimited. Use the argument sep = "\t"
in
read.csv()
to properly format the data. There is no header row, so use head = FALSE
.
Add column names to help identify columns.
colnames(mammal_sizes) <- c("continent", "status", "order",
"family", "genus", "species", "log_mass", "combined_mass",
"reference")
To start let’s explore the data a little and then start looking at the major question.
The following dplyr
code will determine how many genera (plural of genus) are
in the dataset:
```
nrow(distinct(select(mammal_sizes, genus)))
``` Modify this code into a function to determine the number of species. Remember that a species is uniquely defined by the combination of its genus name and its species name. Print the result to the screen. The number should be between 4000 and 5000.
mean()
should help you here.
Don’t worry about species that occur more than once. We’ll consider
the values on different continents to represent independent data points.