Packages

Basic Dplyr

Does a lot of the same things we’ve learned to do in SQL.

surveys <- read.csv("surveys.csv")

Problem 3, 1-5

Group and Summarize

Like in SQL we can also group by and aggregate

summarize(surveys_by_species, avg_weight = mean(weight))

Problem 3, 6-7

Pipes

surveys_DS <- filter(surveys, species_id == 'DS')`
surveys_DS_by_yr <- group_by(surveys_DS, year)
surveys_DS_by_yr_avg_weight <- summarize(surveys_DS_by_yr, avg_weight = mean(weight))

surveys %>% filter(species_id == 'DS')

surveys %>% filter(species_id == 'DS') %>% group_by(year) %>% summarize(avg_weight = mean(weight))
surveys %>%
  filter(species_id == 'DS') %>%
  group_by(year) %>%
  summarize(avg_weight = mean(weight))

Using Dplyr with databases

We can also use Dplyr to access data directly from a database.

portaldb <- src_sqlite("portal_mammals.sqlite")
surveys <- tbl(portaldb, "surveys")
surveys
query <- "SELECT year, month, day, genus, species
          FROM surveys JOIN species
          ON surveys.species_id = species.species_id"
tbl(portaldb, sql(query))