Set up R console:
library(tidyr)
Remember the five basic rules of database structure
- Order doesn’t matter
- No duplicate rows
- Every cell contains one value
- One column per type of information
- No redundant information
Restructure tables with messy data
- Cells with multiple values break rule #3.
- Redundant column information or cross-tabulated data breaks rule #4.
- Here is another messy dataset.
scary_sightings <- data.frame(
animals = c("lions", "tigers", "bears"),
brick_road = c("1-Y", "0-N", "0-N"),
emerald_city = c("17-N", "8-Y", "64-N")
)
> scary_sightings
animals brick_road emerald_city
1 lions 1-Y 17-N
2 tigers 0-N 8-Y
3 bears 0-N 64-N
- What do the values in the table represent?
lions
andtigers
andbears
are names ofanimals
1-Y
,17-N
, etc. represent:- Counts of animals sighted on the
brick_road
or in theemerald_city
- And, were the animal sightings scary?
Y
orN
- Counts of animals sighted on the
Ask students,
- “What makes
scary_sightings
messy?”- “What are the variables in
scary_sightings
?”
- Tidy variables in
scary_sightings
animals
lions
andtigers
andbears
site
brick_road
andemerald_city
sightings
- count
scared
Y
orN
tidyr
helps restructure messy data
gather()
- Removes redundant columns
- Arguments:
- Piped
data.frame
- Column name for grouping of old column headers
- Column name for grouping of old column values
- Column range for old columns with values
- Piped
less_scary <- scary_sightings %>%
gather(site, scary_counts, brick_road:emerald_city)
> less_scary
animals site scary_counts
1 lions brick_road 1-Y
2 tigers brick_road 0-N
3 bears brick_road 0-N
4 lions emerald_city 17-N
5 tigers emerald_city 8-Y
6 bears emerald_city 64-N
separate()
- Separates multiple values in single column
- Arguments:
- Piped
data.frame
- Column name
- New column names
- Separator value or character
- Piped
sightings <- less_scary %>%
separate(scary_counts, c("count", "scary"), sep="-")
> sightings
animals site count scary
1 lions brick_road 1 Y
2 tigers brick_road 0 N
3 bears brick_road 0 N
4 lions emerald_city 17 N
5 tigers emerald_city 8 Y
6 bears emerald_city 64 N